[OSM-dev] Choking on the full-planet file

Eric Wolf ebwolf at gmail.com
Fri Feb 25 18:07:13 GMT 2011

Has anyone done much work with the OSM full-planet file? If so, can
you give me some hints? The file is just big enough to make life

What I'm trying to do is look at changes to the USGS GNIS data in OSM.
I want to convert node data for the US (probably just the lower 48
states) into a format that I can work on in ArcGIS. I want to have
access to historical data.

I've been trying to get Dominic Stubbins' OSM Simple Loader to do what
I want with mixed results. I can get it to read a standard planet file
fine. The script uses BZ2file in Python to read the compressed file
directly and generates an ESRI File GeoDatabase. That all works fine
and dandy. I tweaked the code to be able to handle the lack of end of
lines in the full-planet file. For some reason, it just quits reading
the file after exactly 900,000 bytes. I can run the same code against
a 10MB bz2 file and it works perfectly.

My suspicion is that the bz2 module in 32-bit Python for Windows is
choking on the 22GB compressed full-planet file. I'm stuck in 32-bit
Windows because I want the File Geodatabase output and need to use
ESRI's Python module.

I'm going to try two things this morning:

1. Download the full-planet file to my 64-bit Ubuntu box and try using
bzcat + osmosis to extract just the US and then run the script to
build the FGDB against it.

2. Create a Python osm-extract that'll work against the full-planet
file using bz2file but not use any ESRI crap. That way I can run it on
64-bit platforms and hopefully not have bz2file choke.

My final move will be to go buy a 2TB drive and just work with the
uncompressed file. Another option is to fire up a big Amazon EC2
instance but moving these files around is a PITA. I'd prefer to
improving the code base for working the planet files, especially
getting some base tools that will directly read the bz2 files and not
care about end-of-line would seem to be handy.


Eric B. Wolf                           720-334-7734

More information about the dev mailing list