[OSM-dev] Binary OSM db (was: Re: on-disk indexing of geodata)

Sascha Silbe sascha-ml-gis-osm-dev at silbe.org
Wed Oct 22 17:26:26 BST 2008


On Sun, Oct 19, 2008 at 01:42:39PM +0200, Freek wrote:

>> Much simpler, as more complex index structures also use more memory
>> (thus risking disk access penalty). Sorted lists, scanned with binary
>> search (=> O(log n)).
> Ok, but what about the O(1)?
Indices (or data) with one entry per id (potentially a starting point 
inside another index).

> Regarding index structures, I think you can reduce disk access penalty 
> by using slightly more complex index structures (in this case B-trees 
> for example). The O(log_2 n) is the number of disk operations you have 
> to perform. Using a B-tree you can get this down to O(log_B n), where 
> B is the number of elements per node. If you use your "implicit 
> pointer" idea, you don't even have the extra storage penalty :-)
Depending on the application (RAM size, access patterns, etc.), it might 
be worth it. Configurable index methods would be great. :)

> So you transparently rewrite internal IDs to external ones and vice 
> versa on in- and output?
Not transparently within an application, but for in- and output (e.g. 
start, goal and path for a router) the application probably will convert 
them, so it's transparently for the user.

> What is the advantage of having separate internal IDs?
OSM ids are sparse (at most one object per id). Internal ids always 
point to exactly one object (at least one, at most one).

> Ah, nice. By the way, is all this still in (early) development, or is 
> there already a description/website/demo online?
There's no website or demo, but it works quite fine in router prototype 
(routing works fine but CLI and postprocessing are not implemented yet).
It's available (license is GPL) in my 2008 GNU arch repository [1] as 
osmbindb--devel--0.1 (the router is osmroute--devel--2.0 in my 2007 
repository [2]). To check it out:

1. install tla (apt-get install tla, emerge tla, ...)
2. tla register-archive 
http:///sascha.silbe.org/arch/sascha-arch@silbe.org--2008
3. tla get osmbindb--devel--0.1 osmbindb--devel--0.1

This will check it out into a directory called osmbindb--devel--0.1 (the 
second parameter of the tla get invocation)

> Ok. Do you plan to make your current format into a kind of standard 
> for binary, indexed OSM data or do you want to go with, say, Marcus' 
> plans (for which I don't clearly see if he wants it to be lossy or 
> lossless)?
Well, obviously I like my format better, but if any other proposal 
(including Markus' one) turns out to be better, that's fine with me. :)

[Geocoding with Gosmore]
> I remembered that it did. The feature list on the wiki says:
> "Incremental search of all tags. Results are ordered from nearest to 
> farthest."
Judging from Nic Roets' mails on routing at osm, there is no spatial index 
involved.


[1] http:///sascha.silbe.org/arch/sascha-arch@silbe.org--2008
[2] http:///sascha.silbe.org/arch/sascha-arch@silbe.org--2007

CU Sascha

-- 
http://sascha.silbe.org/
http://www.infra-silbe.de/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 481 bytes
Desc: Digital signature
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20081022/da47d91c/attachment.pgp>


More information about the dev mailing list