[OSM-dev] Binary OSM db (was: Re: on-disk indexing of geodata)

Freek freek_osm at vanwal.nl
Sun Oct 19 15:11:46 BST 2008

On Sunday 19 October 2008, Matt Amos wrote:
> On Sun, Oct 19, 2008 at 12:42 PM, Freek <freek_osm at vanwal.nl> wrote:
> > On Sunday 19 October 2008, Sascha Silbe wrote:
> >> On Fri, Oct 17, 2008 at 10:46:39PM +0200, Freek wrote:
> >> >> Node/way/relation internal <-> external id:
> >> >>      - int -> ext O(1)
> >> >>      - ext -> int O(log n)
> >> >
> >> > So what kind of indexes do you have, hash tables and balanced binary
> >> > search trees (what about external memory)?
> >>
> >> Much simpler, as more complex index structures also use more memory
> >> (thus risking disk access penalty). Sorted lists, scanned with binary
> >> search (=> O(log n)).
> >
> > Ok, but what about the O(1)?
> as i understand it (having lurked on this discussion) the internal to
> external mapping is done by mmap()ed array lookup.

Right, that's possible because internal IDs are allocated consecutively, so 
they act as index/pointer in this case.

> > Regarding index structures, I think you can reduce disk access penalty by
> > using slightly more complex index structures (in this case B-trees for
> > example). The O(log_2 n) is the number of disk operations you have to
> > perform. Using a B-tree you can get this down to O(log_B n), where B is
> > the number of elements per node. If you use your "implicit pointer" idea,
> > you don't even have the extra storage penalty :-)
> in this case i think a trie (or a van emde boas tree) would be the
> most efficient, but at the cost of implementation complexity. if this
> is to become a common format, its definitely worth keeping it simple :-)

Yes, it depends on how often you would like to do such an external-to-
internal-ID look up. If this only happens occasionally, using binary search on 
the above-mentioned array may suffice (assuming internal and external IDs 
have the same order). Otherwise, vEB trees or similar can always be added 
later (the static setting making everything a lot less complex).


More information about the dev mailing list