[OSM-dev] Planet to Shapefiles

Christopher Schmidt crschmidt at metacarta.com
Mon Aug 28 02:04:18 BST 2006


> Could we short-circuit the GML? If I have the time (there's freemap stuff I
> need to do for the SoC conference; however it's more likely I can do somethig
> with it after that) I can have a go at doing something which goes straight
> from planet to shapefiles. However I would need to research shapefiles first
> so it wouldn't be a one-day job.

It wouldn't help anyway. The part that cuases problems is not converting
from GML to shapefiles, but converting from topological format to
topographical format, which requires making a large list of nodes, and
storing it somewhere, so that you can access it later to build up the
actual locations of segments.

However, there is a slower but less memory intensive solution to this
problem: instead of using a standard python hash, one can instead use a
bdb hash. This is significantly slower, but it results in almost
infinite scalability: definitely scalability far beyond the current
level. Instead of taking (using the July Planet dump for an example)
700MB of RAM to store the segments, they are instead stored on disk in
80MB. This order of magnitude savings is compounded by the fact that
it's disk, instead of RAM, which is typically far more plentiful than
RAM.

A script which uses this improvement is attached. Timing the run for the
July Planet file gives:

crschmidt at merrimack:~$ time python osm2gml_simple.py <
                         planet.200607.1.osm > outputfile2.gml

real    20m43.992s
user    10m2.758s
sys     0m25.922s

So, it's slower, but should now scale linearly. I'm running it on the
newest dump now: will report back findings in the morning.

-- Chris
-------------- next part --------------
A non-text attachment was scrubbed...
Name: osm2gml_simple.py
Type: text/x-python
Size: 4674 bytes
Desc: not available
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20060827/86ca35d5/attachment.py>


More information about the dev mailing list