[OSM-dev] on-disk indexing of geodata

Fri Oct 17 17:10:51 BST 2008

On Friday 17 October 2008, Sascha Silbe wrote:
> On Fri, Oct 17, 2008 at 05:20:52PM +0200, Freek wrote:
> >> But how do you do those splits, i.e. node inserts?
> >
> > Just pick a new block somewhere else where you have free space (for
> > example at the end of your mmap()'ed file), then split the pointers to
> > the child nodes evenly over the two blocks (old and new one) so they
> > are about half-full, and finally add a pointer to the new block in the
> > parent node (again splitting if necessary, etc.).
>
> OK, so you just use pointers (=> index gets much larger)

Ah, you're using the fact that your tree is a complete fan-out-8 tree (except 
for the last part)? That's nice indeed.

> and often give 
> up locality (even if you spread some free blocks across the index).

If your blocks are so large that seeking for them takes about as long as 
retrieving them, it should not be that big a problem. I think a fan-out of 8 
is quite low.

> Somehow I'd hoped for a silver bullet. ;)
> For regular planet imports and hourly diffs, using the current structure
> + invalidation bitmap for the bulk part and a pointer-based RTree for
> the updated objects might be an idea. Or just a second database with the
> current structure (=> easy to implement) for the updated objects -
> updates usually are small (compared to the whole data set), so an index
> rebuild should be fast.

That might be quite a good idea in OSM practice, you just rebuild the bulk 
database every so often. Perhaps you can even just use an internal-memory 
data structure (KD-tree, quad tree, something like that) for the updated 
part.

-- 
Freek