[OSM-dev] Help needed for processing the planet.osm (forosmdoc.com)

Ben Supnik bsupnik at xsquawkbox.net
Mon Aug 24 13:41:34 BST 2009


hi Chris,

This may be a little bit useless but...I use multi-pass and tiling 
techniques to get the planet under my control for my work...I'm using C 
but a 32-bit process, so these techniques should work with Java.

- When I split the planet, I use a very compact spatial index to make 
sure I don't lose pieces (E.g. I maintain bounding boxes) .. this 
ensures that I take nodes outside a clipping box if the way passes 
through the clipping box.  With bit-packing, the indices (barely) fit in 
memory.

- The actual output is done in multiple passes, N files at a time, based 
on the max number of file descriptors.  Since I am tiling to 1x1 degree 
tiles and my OS gives me 1024 file descriptors, this part is hellishly 
slow for me but to split to a more reasonable size (say 648 10x10 degree 
blocks) would run in a single pass.

- If your data can be aggregated later, tiling would get you out of 
jail...tile the planet yourself into a smaller number of chunks that can 
then happily be "tag indexed" entirely in RAM using your existing code. 
  Then merge together the final tag counting results, which can 
hopefully be merged easily and cheaply.

- You might not need to grab nodes from outside a tiling box just 
because it is attached to a way..this would make tiling easier than it 
is for other programs that care more about spatial correctness.

- One last idea: split the tag space and then merge that again.  Do one 
planet pass to count up all known tags into a hash table.  Then split 
this "planet schema" into several pieces by the maximum number of tags 
you want, and run a separate count pass on the planet reduced by the 
tags you care about.  Heck - you could even "filter" the planet down to 
those tags with something like grep so the planet XML parse goes by fast 
in each pass.  Either way it's the same idea...cut the dataset in a 
known way and glue together your counting later.

Hope that helps...
Ben




Chris Miller wrote:
> Hi Lars,
> 
> While it's not solving exactly the same problem as you, the mkgmap splitter 
> utility is faced with similar challenges. It is written in Java and uses 
> various techniques to reduce the amount of memory required while processing 
> the planet osm. I've spent quite a bit of time profiling and tuning it, so 
> hopefully there are some ideas (or code) in there that can help you out. 
> For example there are some custom collection-like classes for efficiently 
> holding primitives, bit-level storage of data, and conditional use of different 
> data structures depending on whether a common case or a uncommon case is 
> encountered. Quite a bit of effort has also been put in to avoiding unnecessary 
> object construction. Additionally, I checked in an update yesterday that 
> creates a disk cache after parsing the planet file for the first time. After 
> that it reads from this cache rather than making multiple passes over the 
> planet XML file.
> 
> My suggestion is that you try doing something similar; make one pass over 
> the XML that writes out the data to a custom binary format. Then you'll be 
> able to make multiple passes over the data much more quickly, processing 
> a subset of the data each time. You can choose an appropriate sized subset 
> of the data depending on how much you want to trade off speed vs performance 
> (that's exactly what the --max-areas parameter does with the splitter).
> 
> You can grab the splitter from here if you want to take a look:
> 
> http://www.mkgmap.org.uk/page/tile-splitter
> 
> I've also worked on other similar problems at my job where I've used in-memory 
> compression of data to greatly reduce the RAM required. This approach depends 
> a lot on being able to find a good way to exploit any redundancy in the particular 
> data you're working with.
> 
> I'm happy to discuss this further with you offline if you like.
> 
> Chris
> 
> 
> 
> 
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/dev
> 

-- 
Scenery Home Page: http://scenery.x-plane.com/
Scenery blog: http://xplanescenery.blogspot.com/
Plugin SDK: http://www.xsquawkbox.net/xpsdk/
X-Plane Wiki: http://wiki.x-plane.com/
Scenery mailing list: x-plane-scenery at yahoogroups.com
Developer mailing list: x-plane-dev at yahoogroups.com




More information about the dev mailing list