<br><br><div class="gmail_quote">On Wed, Dec 1, 2010 at 8:18 PM, Scott Crosby <span dir="ltr"><<a href="mailto:scrosby@cs.rice.edu">scrosby@cs.rice.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

<div class="im">On Wed, Dec 1, 2010 at 7:27 AM, Nic Roets <<a href="mailto:nroets@gmail.com">nroets@gmail.com</a>> wrote:<br>

> Hello Scott,<br>

><br>

> How do you keep track of what bboxs each entity belongs to ?<br>

<br>

</div>An Int2ShortMultiMap implemented by composing two underlying<br>

Int2ShortMap implementations with different space efficiency<br>

tradeoffs, a custom sparsearray<br>

implementation based on<br>

<a href="http://google-sparsehash.googlecode.com/svn/trunk/doc/implementation.html" target="_blank">http://google-sparsehash.googlecode.com/svn/trunk/doc/implementation.html</a>,<br>

and one imported from a library of java collections specialized to<br>

primitive types ('fastutils') that uses a standard hashtable. The<br>

hybrid has a memory overhead of about 4 bytes per node output, storing<br>

750m keys with 800m vals in 3.2gb of heap.<br>

<br>

The approach for generating an Int2ShortMultiMap from several<br>

Int2ShortMap's is by layering them. When put()ing keys, I store in<br>

Int2ShortMap[0], but if the key is already there, I try<br>

Int2ShortMap[1], ..... until I find one that is free. I create<br>

additional maps as needed. For dense maps, sparsehash is the more<br>

efficient implementation, and for non-dense maps at the higher layers,<br>

a hashtable is the more efficient implementation.<br>

<br>

To avoid checking each bbox for each point, used a simpler design than<br>

a quadtree or r-tree as andrzej suggests, some precomputing and binary<br>

searches at a cost of (sqrt(n)) instead of O(log n) for mostly<br>

non-overlapping-regions.<br>

<div class="im"><br>

><br>

> I'm not really asking a question, I'm just saying that I found a way to<br>

> reduce the memory requirement for that considerably. Instead of a bit per<br>

> bbox per entry, I store only 16 bits or 32 bits per entry. Here is the<br>

> source.<br>

> <a href="http://trac.openstreetmap.org/browser/applications/rendering/gosmore/bboxSplit.cpp?rev=24484" target="_blank">http://trac.openstreetmap.org/browser/applications/rendering/gosmore/bboxSplit.cpp?rev=24484</a><br>


<br>

</div>Definitely. Can you explain how you track the information?<br>

<br>

You have a much more concise implememtation than mine, but I can't<br>

easily figure it out what it is doing from the source code.<br></blockquote><div><br>Let's say there are 80 bboxes, then you can use a 80 bit number for each entity to record which bboxes they fell into and which they didn't. That 2^80 space is extremely sparse. For my regular splits of 80 bboxes there are typically only 1000 combinations that actually occur. So I store 1000 "younions" and then a 16-bit index per entity. And my bbox are as chaotic as they come:<br>

<a href="http://dev.openstreetmap.de/gosmore/">http://dev.openstreetmap.de/gosmore/</a><br> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">


<font color="#888888"><br>

Scott<br>

</font></blockquote></div><br>