[OSM-dev] Osmosis split of the planet

Brett Henderson brett at bretth.com
Tue Feb 9 00:50:06 GMT 2010


Hi Nic,

Frederik makes a good point.  Using the BitSet id tracker for planet level
splitting, and IdList id tracker for small bboxes should provide significant
memory usage gains.  I've had a closer look at the code, and the IdList uses
an integer (ie. 4 bytes) of memory per id within the bounding box meaning
that it will be more efficient than BitSet for any bounding box containing
less than 1/32 of the data in the planet.  BitSet by comparison uses a fixed
amount of memory which is (maximum_id / 8 = total bytes).  In fact it will
be slightly better than that because the planet id allocation is sparse due
with ids no longer existing in the current planet file due to deletes.

Some example numbers.  There are actually several id trackers created per
task, but the one containing node ids is the important one.  If there are
500 million nodes in the planet, a BitSet should consume approximately 60MB
of memory.  An comparable IdList would consume almost 1.9GB.  However, if a
bounding box contains only 10 million nodes, it will consume only 38MB
compared to the BitSet still consuming a fixed 60MB of memory.

If you're having problems with the dividing line splitting bounding boxes in
two, can you use the --bounding-polygon task instead to give you a more
accurate dividing line.  It consumes identical memory to --bounding-box, but
uses a more complicated implementation for deciding if a node lies within
the specified area.  For simple polygons with limited numbers of points I
think the performance will still be very good.

I didn't understand your suggestion for a better id list tracker
implementation, but I think it would require a single task that has
knowledge of all bounding boxes being produced instead of the current
implementation where each box is processed independently with its own id
tracker.  Your needs may require a dedicated plugin with a new task targeted
to your requirements.

Brett


On Tue, Feb 9, 2010 at 12:00 AM, Nic Roets <nroets at gmail.com> wrote:

> Hello Frederik,
>
> I was looking for something like your outPipe=r1, inPipe=r1 trick. But as
> Brett (and you) point out, doing it in one pass will require lots and lots
> of RAM.
>
> And I've tried it with a dividing line in the US. Except 51 rectangles
> cross that dividing line and I know 51 BitSets will require more than 4GB
> RAM. So two passes are also not an option.
>
>
> On Mon, Feb 8, 2010 at 2:06 PM, Frederik Ramm <frederik at remote.org> wrote:
>
>>
>>
>> PS: Is it by design that some areas are not covered by any of your
>> rectangles?
>>
>>
> All areas are covered. Only after clicking on the rectangle will it become
> a larger yellow rectangle and you can see it's true size.
>
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20100209/06f832c9/attachment.html>


More information about the dev mailing list