[OSM-dev] Osmosis split of the planet
nroets at gmail.com
Mon Feb 8 10:09:32 GMT 2010
Yes, named pipes may work as intermediate streams.
I guess I could try the 64-bit JVM. But I fear that it will become a problem
every few months as the planet increases (a larger planet may also mean that
I need more bboxes !)
As for coding a new idTrackerType that would reduce memory usage: Start with
a binary space partitioning (as above) with depth k. Then have k Bitsets to
indicate if entities are to the left* the partition and have another k
Bitsets to indicate if spans the partition (crosses the boundary). If it
neither spans, nor crosses, then it is to the right of the partition. In my
case it will reduce the number of bitsets from 190 to +- 16 Bitsets.
* Could also be 'up', based on the partition.
Thanks for your insights.
On Mon, Feb 8, 2010 at 11:22 AM, Brett Henderson <brett at bretth.com> wrote:
> Hi Nic,
> I'm not entirely sure what you mean by intermediate streams. Do you wish
> to split the execution across multiple Osmosis instances and stream data
> between processes? If so, there is no inbuilt support for this other than
> reading and writing from stdin and stdout respectively. Perhaps there's
> some way of using named pipes in Linux but that's not something I've ever
> tried. But the main overhead is typically in XML processing which will be
> encountered on the boundary point between processes with much the same CPU
> overhead as reading and writing from temporary files.
> If the only reason for splitting Osmosis across multiple instances is to
> allow you to use more than 4GB memory total, then can you switch to a 64-bit
> JVM? That will let you use as much memory as you have in your system. I
> assume you're currently setting the -Xmx value to something less than 4GB
> based on a 32-bit JVM limitation.
> As an FYI, the BitSet idTrackerType uses a fixed amount of memory per
> bounding box task dependent on the maximum ids in the planet file. So if
> you're extracting 60 bounding boxes, it doesn't matter where in the world
> the bounding box resides or how large it is, the BitSet will consume much
> the same memory. You need to find how many bounding boxes can fit within
> the 4GB memory limit and stick within that limit. The limit will decrease
> over time as the maximum ids in the planet increase. The IdList
> idTrackerType uses memory proportional to bounding box entity count, but it
> consumes much more data per entity (at least 32 times as much, maybe 64,
> haven't investigated this in detail) due to the fact that it holds each id
> in a sorted list instead of BitSet which stores each id as a single bit in a
> massive data array. IdList is great for large numbers of small bounding
> boxes where the total area covers a small portion of the planet. If
> somebody can code up with a better mechanism for storing these ids it can be
> plugged in as an alternative idTrackerType.
> On Mon, Feb 8, 2010 at 7:54 PM, Nic Roets <nroets at gmail.com> wrote:
>> I'm trying to split the planet into 170 overlapping bboxes like this:
>> But osmosis keeps running into the 4GB java limit, even after I made a
>> split down the Atlantic.
>> 1. Split the planet into 3 bboxes: Americas, Europe / Africa /Asia
>> /Australia and a bbox that is just large enough to cover all the bboxes that
>> cross the dividing line.
>> 2. Running osmosis for the 95 bboxes in the Americas fails.
>> 3. Running osmosis for the 12 Atlantic bboxes succeeds.
>> 4. Running osmosis for the 60 bboxes in Europe / Africa /Asia /Australia
>> gunzip <middle.osm.gz | ionice -c 3 nice -n 19 osmosis --read-xml
>> enableDateParsing=no file=/dev/stdin --tee 60 \
>> --bb idTrackerType="BitSet" left=73.12500 right=180.00000 top=9.44906
>> bottom=-85.05113 --wx 0720048510241024.osm.gz \
>> --bb idTrackerType="BitSet" left=120.58594 right=180.00000 top=72.91964
>> bottom=-25.48295 --wx 0855020310240587.osm.gz \
>> --bb idTrackerType="BitSet" left=98.43750 right=172.61719 top=13.23995
>> bottom=-85.05113 --wx 0792047410031024.osm.gz \
>> --bb idTrackerType="BitSet" left=100.19531 right=150.82031 top=30.14513
>> bottom=-75.84517 --wx 0797042209410852.osm.gz \
>> The obvious solution is just to repeat this algorithm until I find
>> something that will work. And the number of candidate splitting latitudes
>> and longitudes is small (4 times the number of bboxes), so evaluating them
>> all in software is feasible (esp. with dynamic programming).
>> Now my question is: Can I tell osmosis to work with intermediate streams ?
>> That would remove the need to gzip / gunzip and write / read from disk.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the dev