[OSM-dev] Binary OSM db (was: Re: on-disk indexing of geodata)

Sascha Silbe sascha-ml-gis-osm-dev at silbe.org
Fri Oct 17 19:47:17 BST 2008


On Fri, Oct 17, 2008 at 07:47:14PM +0200, Freek wrote:

> Looks like we have enough ideas for adding indexes and such to an OSM 
> binary format, so if the rest of the structure is fixed I think we can 
> get it off the ground (I wouldn't mind at least helping out here and 
> there).
OK, so what do "we" want in a binary DB? Currently there is (in my 
implementation):

Node/way/relation internal <-> external id:
    - int -> ext O(1)
    - ext -> int O(log n)

Node location (2x 32bit int, 64bit Peano):
    - id -> location O(1)
    - location -> int O(log n)

Node/way/relation types (using tag -> id mapping with wildcard support 
during import):
    - id -> types ~O(1)
    - type -> ids ~O(log n)

Node/way/relation "interesting" bitmap (name/ref/is_in set and/or known 
type):
    - id -> interesting O(1)

Node/way/relation tags (k/v-pairs are \n-separated and properly 
escaped):
    - id -> all tags O(1)

Way MBR / RTree:
    - id -> MBR O(1)
    - MBR -> ids ~O(log n)

Way members (ordered sequence of node ids):
    - way id -> member ids ~O(1)
    - member id -> way ids ~O(1)

Relation members (ordered sequence of id+type+role):
    - rel id -> member ids ~O(1)
    - member id -> rel ids ~O(log n)

Node/way/relation name/ref/is_in:
    - text files sorted by id (intended for namefinder - could be 
changed into "proper" database entries similar to the way tags are 
handled)

~O(...) means it's actually also dependant on the number of entries for 
the given node/way/relation resp. the size of the MBR for RTree lookup.
Some O(log n) steps be can optimized by an apprioriate index to be O(1), 
but this further increases db size => probability of exceeding RAM and 
thus actually reducing instead of improving the performance is 
significant.
Actually, there probably isn't a one-fits-all solution for the indices, 
it depends too much on the application and its environment. So each 
index should be optional for the high-level API (makes it more complex, 
though).

My implementation isn't documented well and there's no "high-level" API 
(though both can obviously be fixed), but the above list should be a 
good basis for discussion.

CU Sascha

-- 
http://sascha.silbe.org/
http://www.infra-silbe.de/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 481 bytes
Desc: Digital signature
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20081017/d15ec9be/attachment.pgp>


More information about the dev mailing list