[OSM-dev] [OSM-talk] Osmosis error with multiple bouding boxes

Brett Henderson brett at bretth.com
Fri Oct 26 00:12:09 BST 2007

Karl Newman wrote:
> Crossposting to dev
>> Osmosis needs some work with regard to tee and many outputs still, because
>> it's quitting here with 100+...on out of heap space errors (The server has
>> 4GB), with 50 pipes it worked. But other than that it works many times
>> faster than processing the planet all over for each tile. So I'm happy with
>> it.
> Maybe the task could be tweaked with an option to buffer all the data
> into a file, then send it on to each pipe sequentially. You could also
> set up a hierarchical/recursive series of Osmosis calls (sort of lame,
> I know).
Not sure what you mean by the hierarchical/recursive call bit ...

Firstly, as Jon mentioned the task memory limit may need to be 
increased.  If that solves the problem we're done :-)

Otherwise ...

Firstly a bit of background.  Most osmosis tasks maintain no state so 
have very small RAM requirements.  The --bounding-box and 
--bounding-polygon tasks are an exception because they maintain a BitSet 
for storing all the nodes, ways and segments that have been passed to 
the output.  To be honest I haven't checked to see how much RAM this 
uses but so far it hasn't been an issue for me.  Presumably the RAM 
usage is something like (NUM_ENTITIES / 8) bytes.  Each bit will take 
1/8 of a byte which isn't much but will add up with the ids in the osm 
db.  It will also obviously add up when multiple bounding box tasks are 
being used at the same time.  The usage is probably proportional to the 
highest id for each type regardless of whether the bit is used.  If the 
highest id is 100,000,000 but only 50,000,000 are used, 100,000,000 bits 
will still be required.  I don't know if the java implementation is 
smart enough to store cleared bits sparsely but I doubt it.

A couple of solutions might be:
1. As you've already suggested, store all data to a temp file then 
forward to each output sequentially.  My main concern is that this will 
probably be slower than re-reading the planet for each bounding box.  
I've improved performance of temp files recently but they're still not 
blazingly fast.
2. Persist the BitSet.  There's already an IndexStore which is basically 
a way of storing and loading long values indexed by a long.  This could 
be used to write out bits to disk and reload randomly.  I think this 
will be much quicker than loading complete objects from disk but it will 
be a significant performance hit on the current in-memory implementation.

I suspect that the fastest approach will be to work around it by simply 
running as many bounding box tasks as will fit in RAM, and invoking 
osmosis several times if many bounding boxes are required re-reading the 
planet for each invocation.  Or perhaps a variant on Option 1 might be 
appropriate where the --tee task stores all input data then forwards it 
to several output tasks at a time but not all at once to avoid reading 
from the store more times than necessary.  I'll call this Option 3.

3. Store all input data.  Forward to a configurable number of outputs at 
a time.  For example, if 150 bboxes are required, store all input data, 
then forward to 50 bboxes at a time.

If anybody wishes to experiment, the classes in question are:
BigBitSet - A wrapper around java BitSet allowing the use of long 
indexes (currently just casts to an int but will be extended when 
necessary) and negative numbers.
AreaFilter - The base class for --bounding-box (BoundingBoxFilter) and 
--bounding-polygon (PolygonFilter) tasks.  It is the one that utilises 
EntityTee - The task implementation for "tee-ing" input data to multiple 
ChangeTee - The same as EntityTee but for change streams.  Probably not 
applicable here.
IndexStore - Can be used for random access long data type persistence.
SimpleObjectStore - Useful for simple store and forward solutions.

If somebody wishes to improve temp file performance, the main classes to 
look at all classes in the com.bretth.osmosis.core.store package.  
StoreWriter, StoreReader, ObjectWriter, ObjectReader are some of the 
classes used by all object store implementations.  There may be some 
performance gains possible.


More information about the dev mailing list