[OSM-dev] Binary OSM db (was: Re: on-disk indexing of geodata)
Freek
freek_osm at vanwal.nl
Sun Oct 19 12:42:39 BST 2008
On Sunday 19 October 2008, Sascha Silbe wrote:
> On Fri, Oct 17, 2008 at 10:46:39PM +0200, Freek wrote:
> >> Node/way/relation internal <-> external id:
> >> - int -> ext O(1)
> >> - ext -> int O(log n)
> >
> > So what kind of indexes do you have, hash tables and balanced binary
> > search trees (what about external memory)?
>
> Much simpler, as more complex index structures also use more memory
> (thus risking disk access penalty). Sorted lists, scanned with binary
> search (=> O(log n)).
Ok, but what about the O(1)?
Regarding index structures, I think you can reduce disk access penalty by
using slightly more complex index structures (in this case B-trees for
example). The O(log_2 n) is the number of disk operations you have to
perform. Using a B-tree you can get this down to O(log_B n), where B is the
number of elements per node. If you use your "implicit pointer" idea, you
don't even have the extra storage penalty :-)
> > Anyway, when do you use such an index? I guess one would want to know
> > the OSM id of objects returned by certain queries, but not
> > specifically a translation from internal to external id (though of
> > course you probably get that for free if you have an index on internal
> > id).
>
> Usually the ID mappings are used on in- and output, respectively. The
> program core will work purely with internal IDs.
So you transparently rewrite internal IDs to external ones and vice versa on
in- and output?
What is the advantage of having separate internal IDs? (I mean, there's
probably a good reason, I just don't know...)
> > [...] Can I, for example, retrieve all post boxes using this index?
>
> Yes, you can. If you add an appropriate entry to the rules file prior to
> running the importer, that is. Excerpt from my current rules file:
>
> {'tags': {'amenity': 'post_box'}, 'maxScaleDisplay':
> 10000, 'maxScaleName': 5000, 'icon': 'public/post_box'},
>
> This file is preprocessed with a Python script, producing a file that's
> easy to parse in C++ (but much less readable).
> It's also used by my moving map viewer (with basic navigation support),
> that's why there are rendering styles in it, too.
Ah, nice. By the way, is all this still in (early) development, or is there
already a description/website/demo online?
> >> Node/way/relation "interesting" bitmap (name/ref/is_in set and/or
> >> known
> >> type):
> >> - id -> interesting O(1)
> >
> > Important design decision, but more for the data format itself: does
> > it store all nodes/ways/relations, also the "uninteresting" ones?
>
> Yes, it does - otherwise this bitmap wouldn't be necessary.
Ok. Do you plan to make your current format into a kind of standard for
binary, indexed OSM data or do you want to go with, say, Marcus' plans (for
which I don't clearly see if he wants it to be lossy or lossless)?
> Already had a good look on the current namefinder - I'm quite impressed
> by its performance, given that it's based on MySQL+PHP.
> Didn't know that Gosmore has a geolocation feature, though.
I remembered that it did. The feature list on the wiki says:
"Incremental search of all tags. Results are ordered from nearest to
farthest."
--
Freek
More information about the dev
mailing list