[OSM-dev] Help needed for processing the planet.osm (forosmdoc.com)
Chris Miller
chris.miller at kbcfp.com
Mon Aug 24 13:22:41 BST 2009
Hi Lars,
While it's not solving exactly the same problem as you, the mkgmap splitter
utility is faced with similar challenges. It is written in Java and uses
various techniques to reduce the amount of memory required while processing
the planet osm. I've spent quite a bit of time profiling and tuning it, so
hopefully there are some ideas (or code) in there that can help you out.
For example there are some custom collection-like classes for efficiently
holding primitives, bit-level storage of data, and conditional use of different
data structures depending on whether a common case or a uncommon case is
encountered. Quite a bit of effort has also been put in to avoiding unnecessary
object construction. Additionally, I checked in an update yesterday that
creates a disk cache after parsing the planet file for the first time. After
that it reads from this cache rather than making multiple passes over the
planet XML file.
My suggestion is that you try doing something similar; make one pass over
the XML that writes out the data to a custom binary format. Then you'll be
able to make multiple passes over the data much more quickly, processing
a subset of the data each time. You can choose an appropriate sized subset
of the data depending on how much you want to trade off speed vs performance
(that's exactly what the --max-areas parameter does with the splitter).
You can grab the splitter from here if you want to take a look:
http://www.mkgmap.org.uk/page/tile-splitter
I've also worked on other similar problems at my job where I've used in-memory
compression of data to greatly reduce the RAM required. This approach depends
a lot on being able to find a good way to exploit any redundancy in the particular
data you're working with.
I'm happy to discuss this further with you offline if you like.
Chris
More information about the dev
mailing list