[OSM-dev] [OSM-talk] Osmosis error with multiple bouding boxes
brett at bretth.com
Fri Oct 26 00:12:09 BST 2007
Karl Newman wrote:
> Crossposting to dev
>> Osmosis needs some work with regard to tee and many outputs still, because
>> it's quitting here with 100+...on out of heap space errors (The server has
>> 4GB), with 50 pipes it worked. But other than that it works many times
>> faster than processing the planet all over for each tile. So I'm happy with
> Maybe the task could be tweaked with an option to buffer all the data
> into a file, then send it on to each pipe sequentially. You could also
> set up a hierarchical/recursive series of Osmosis calls (sort of lame,
> I know).
Not sure what you mean by the hierarchical/recursive call bit ...
Firstly, as Jon mentioned the task memory limit may need to be
increased. If that solves the problem we're done :-)
Firstly a bit of background. Most osmosis tasks maintain no state so
have very small RAM requirements. The --bounding-box and
--bounding-polygon tasks are an exception because they maintain a BitSet
for storing all the nodes, ways and segments that have been passed to
the output. To be honest I haven't checked to see how much RAM this
uses but so far it hasn't been an issue for me. Presumably the RAM
usage is something like (NUM_ENTITIES / 8) bytes. Each bit will take
1/8 of a byte which isn't much but will add up with the ids in the osm
db. It will also obviously add up when multiple bounding box tasks are
being used at the same time. The usage is probably proportional to the
highest id for each type regardless of whether the bit is used. If the
highest id is 100,000,000 but only 50,000,000 are used, 100,000,000 bits
will still be required. I don't know if the java implementation is
smart enough to store cleared bits sparsely but I doubt it.
A couple of solutions might be:
1. As you've already suggested, store all data to a temp file then
forward to each output sequentially. My main concern is that this will
probably be slower than re-reading the planet for each bounding box.
I've improved performance of temp files recently but they're still not
2. Persist the BitSet. There's already an IndexStore which is basically
a way of storing and loading long values indexed by a long. This could
be used to write out bits to disk and reload randomly. I think this
will be much quicker than loading complete objects from disk but it will
be a significant performance hit on the current in-memory implementation.
I suspect that the fastest approach will be to work around it by simply
running as many bounding box tasks as will fit in RAM, and invoking
osmosis several times if many bounding boxes are required re-reading the
planet for each invocation. Or perhaps a variant on Option 1 might be
appropriate where the --tee task stores all input data then forwards it
to several output tasks at a time but not all at once to avoid reading
from the store more times than necessary. I'll call this Option 3.
3. Store all input data. Forward to a configurable number of outputs at
a time. For example, if 150 bboxes are required, store all input data,
then forward to 50 bboxes at a time.
If anybody wishes to experiment, the classes in question are:
BigBitSet - A wrapper around java BitSet allowing the use of long
indexes (currently just casts to an int but will be extended when
necessary) and negative numbers.
AreaFilter - The base class for --bounding-box (BoundingBoxFilter) and
--bounding-polygon (PolygonFilter) tasks. It is the one that utilises
EntityTee - The task implementation for "tee-ing" input data to multiple
ChangeTee - The same as EntityTee but for change streams. Probably not
IndexStore - Can be used for random access long data type persistence.
SimpleObjectStore - Useful for simple store and forward solutions.
If somebody wishes to improve temp file performance, the main classes to
look at all classes in the com.bretth.osmosis.core.store package.
StoreWriter, StoreReader, ObjectWriter, ObjectReader are some of the
classes used by all object store implementations. There may be some
performance gains possible.
More information about the dev