[OSM-dev] Splitting the planet into thousands of pieces in one pass.

Scott Crosby scrosby at cs.rice.edu
Wed Dec 1 18:18:39 GMT 2010


On Wed, Dec 1, 2010 at 7:27 AM, Nic Roets <nroets at gmail.com> wrote:
> Hello Scott,
>
> How do you keep track of what bboxs each entity belongs to ?

An Int2ShortMultiMap implemented by composing two underlying
Int2ShortMap implementations with different space efficiency
tradeoffs, a custom sparsearray
implementation based on
http://google-sparsehash.googlecode.com/svn/trunk/doc/implementation.html,
and one imported from a library of java collections specialized to
primitive types ('fastutils') that uses a standard hashtable. The
hybrid has a memory overhead of about 4 bytes per node output, storing
750m keys with 800m vals in 3.2gb of heap.

The approach for generating an Int2ShortMultiMap from several
Int2ShortMap's is by layering them. When put()ing keys, I store in
Int2ShortMap[0], but if the key is already there, I try
Int2ShortMap[1], ..... until I find one that is free. I create
additional maps as needed. For dense maps, sparsehash is the more
efficient implementation, and for non-dense maps at the higher layers,
a hashtable is the more efficient implementation.

To avoid checking each bbox for each point, used a simpler design than
a quadtree or r-tree as andrzej suggests, some precomputing and binary
searches at a cost of (sqrt(n)) instead of O(log n) for mostly
non-overlapping-regions.

>
> I'm not really asking a question, I'm just saying that I found a way to
> reduce the memory requirement for that considerably. Instead of a bit per
> bbox per entry, I store only 16 bits or 32 bits per entry. Here is the
> source.
> http://trac.openstreetmap.org/browser/applications/rendering/gosmore/bboxSplit.cpp?rev=24484

Definitely. Can you explain how you track the information?

You have a much more concise implememtation than mine, but I can't
easily figure it out what it is doing from the source code.

Scott



More information about the dev mailing list