[OSM-dev] [OSM-talk] Osmosis error with multiple bouding boxes

Brett Henderson brett at bretth.com
Fri Oct 26 10:48:50 BST 2007

Lambertus wrote:
> The current method of tee is indeed not truly scalable ;-) With 100 
> pipes and -Xmx2048m the processing is still in the clear, but with 200 
> Osmosis quits with 'out of heap memory' after processing the nodes (no 
> ways exported yet).
That all depends on your definition of scalable, I'm impressed that it 
made it that far :-)  How many bounding boxes do you need to create?  If 
it's only in the 100s then invoking osmosis several times is likely to 
give you the best performance.  Otherwise we can look into persistence 

Having thought about it a bit more, my previous suggestion of persisting 
bit sets to disk may not be a great idea, the files are going to be very 
large and with random access patterns are going to cause your disk to 
spend all of its time seeking because we already know they're not going 
to be fully cached in RAM.  It could be worth a try because it should be 
relatively simple to implement with the IndexStore class, I'm just not 
optimistic about its performance.  A smarter index would be more 
appropriate that caters to the fact that the selected ids within a 
bounding box are going to be very sparse, in other words the number of 
ids within the box are small compared to those outside it.

The store and forward approach isn't likely to help much either, the 
java XML parser is likely to be faster than reading from temp files.  
I'm sure there are ways of speeding up the temp files further but it's 
not trivial.

If you're talking thousands of bounding boxes then a database may be the 
way to go.  It might be quicker to load into a storage format that 
allows you perform similar queries to the MySQL production DB.  The 
production schema itself could be used or perhaps even something simpler 
such as Berkeley DB.

More information about the dev mailing list