[OSM-dev] Osmosis split of the planet

Mon Feb 8 09:22:09 GMT 2010

Hi Nic,

I'm not entirely sure what you mean by intermediate streams.  Do you wish to
split the execution across multiple Osmosis instances and stream data
between processes?  If so, there is no inbuilt support for this other than
reading and writing from stdin and stdout respectively.  Perhaps there's
some way of using named pipes in Linux but that's not something I've ever
tried.  But the main overhead is typically in XML processing which will be
encountered on the boundary point between processes with much the same CPU
overhead as reading and writing from temporary files.

If the only reason for splitting Osmosis across multiple instances is to
allow you to use more than 4GB memory total, then can you switch to a 64-bit
JVM?  That will let you use as much memory as you have in your system.  I
assume you're currently setting the -Xmx value to something less than 4GB
based on a 32-bit JVM limitation.

As an FYI, the BitSet idTrackerType uses a fixed amount of memory per
bounding box task dependent on the maximum ids in the planet file.  So if
you're extracting 60 bounding boxes, it doesn't matter where in the world
the bounding box resides or how large it is, the BitSet will consume much
the same memory.  You need to find how many bounding boxes can fit within
the 4GB memory limit and stick within that limit.  The limit will decrease
over time as the maximum ids in the planet increase.  The IdList
idTrackerType uses memory proportional to bounding box entity count, but it
consumes much more data per entity (at least 32 times as much, maybe 64,
haven't investigated this in detail) due to the fact that it holds each id
in a sorted list instead of BitSet which stores each id as a single bit in a
massive data array.  IdList is great for large numbers of small bounding
boxes where the total area covers a small portion of the planet.  If
somebody can code up with a better mechanism for storing these ids it can be
plugged in as an alternative idTrackerType.

Brett

On Mon, Feb 8, 2010 at 7:54 PM, Nic Roets <nroets at gmail.com> wrote:

> I'm trying to split the planet into 170 overlapping bboxes like this:
>
> http://dev.openstreetmap.de/gosmore/test/
>
> But osmosis keeps running into the 4GB java limit, even after I made a
> split down the Atlantic.
> 1. Split the planet into 3 bboxes: Americas, Europe / Africa /Asia
> /Australia and a bbox that is just large enough to cover all the bboxes that
> cross the dividing line.
> 2. Running osmosis for the 95 bboxes in the Americas fails.
> 3. Running osmosis for the 12 Atlantic bboxes succeeds.
> 4. Running osmosis for the 60 bboxes in Europe / Africa /Asia /Australia
> fails:
>
> gunzip <middle.osm.gz | ionice -c 3 nice -n 19 osmosis --read-xml
> enableDateParsing=no file=/dev/stdin --tee 60 \
>  --bb  idTrackerType="BitSet" left=73.12500 right=180.00000 top=9.44906
> bottom=-85.05113 --wx 0720048510241024.osm.gz \
>  --bb  idTrackerType="BitSet" left=120.58594 right=180.00000 top=72.91964
> bottom=-25.48295 --wx 0855020310240587.osm.gz \
>  --bb  idTrackerType="BitSet" left=98.43750 right=172.61719 top=13.23995
> bottom=-85.05113 --wx 0792047410031024.osm.gz \
>  --bb  idTrackerType="BitSet" left=100.19531 right=150.82031 top=30.14513
> bottom=-75.84517 --wx 0797042209410852.osm.gz \
> ...
>
> The obvious solution is just to repeat this algorithm until I find
> something that will work. And the number of candidate splitting latitudes
> and longitudes is small (4 times the number of bboxes), so evaluating them
> all in software is feasible (esp. with dynamic programming).
>
> Now my question is: Can I tell osmosis to work with intermediate streams ?
> That would remove the need to gzip / gunzip and write / read from disk.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20100208/4703aa5c/attachment.html>