[OSM-dev] [OSM-talk] Osmosis error with multiple bouding boxes
Brett Henderson
brett at bretth.com
Fri Oct 26 10:48:50 BST 2007
Lambertus wrote:
> The current method of tee is indeed not truly scalable ;-) With 100
> pipes and -Xmx2048m the processing is still in the clear, but with 200
> Osmosis quits with 'out of heap memory' after processing the nodes (no
> ways exported yet).
That all depends on your definition of scalable, I'm impressed that it
made it that far :-) How many bounding boxes do you need to create? If
it's only in the 100s then invoking osmosis several times is likely to
give you the best performance. Otherwise we can look into persistence
alternatives.
Having thought about it a bit more, my previous suggestion of persisting
bit sets to disk may not be a great idea, the files are going to be very
large and with random access patterns are going to cause your disk to
spend all of its time seeking because we already know they're not going
to be fully cached in RAM. It could be worth a try because it should be
relatively simple to implement with the IndexStore class, I'm just not
optimistic about its performance. A smarter index would be more
appropriate that caters to the fact that the selected ids within a
bounding box are going to be very sparse, in other words the number of
ids within the box are small compared to those outside it.
The store and forward approach isn't likely to help much either, the
java XML parser is likely to be faster than reading from temp files.
I'm sure there are ways of speeding up the temp files further but it's
not trivial.
If you're talking thousands of bounding boxes then a database may be the
way to go. It might be quicker to load into a storage format that
allows you perform similar queries to the MySQL production DB. The
production schema itself could be used or perhaps even something simpler
such as Berkeley DB.
More information about the dev
mailing list