[OSM-dev] Binary OSM db (was: Re: on-disk indexing of geodata)

Sascha Silbe sascha-ml-gis-osm-dev at silbe.org
Wed Oct 22 18:43:47 BST 2008

On Mon, Oct 20, 2008 at 09:02:47AM +0200, Marcus Wolschon wrote:

> So, for an 1D-index you have ended up with a tree of order 8.
I assume you're talking about my implementation, not an idea (or 
possibly even implementation) of your own.
In the former case: For 1D indices, it's simple sorted lists searched 
via binary search. For 2D indices, it's an 8-ary RTree (i.e. 8 childs 
per node).

> Like the a file with the following record-formats:
> middle-node:
> * recordNumber[32bit] of (current value>>3)
> * recordNumber[32bit] of (current value>>3+1)
> ...
> * recordNumber[32bit] of (current value>>3+7)
That could be a pointer-based index for the "update" database.

> * record-numbers start at 2
> * a record-number of 0 denotes an empty entry
> * a first record-number of 1 denotes a leaf of the tree
> * a middle-node consisting of only 0 is an unused block
I don't like the asymmetry resulting from using record0=1 to indicate a 
leaf node, but using the MSB of the record number / ID isn't 
significantly better.

> leaf:
> * leaf-marker [2x32bit]
> * ID-Number        of indexed OSM-Object 1 [32bit]
> * record-number of indexed OSM-Object 1 [32bit]
> * ID-Number        of indexed OSM-Object 2 [32bit]
> * record-number of indexed OSM-Object 2 [32bit]
> ..
> * ID-Number        of indexed OSM-Object 6 [32bit]
> * record-number of indexed OSM-Object 6 [32bit]
Since the internal ID and the record number are one and the same, we'd 
only need one field per leaf entry.

> For a 2D-index you propose a space-filling curve that
> maps a bouding-box into a list of 1D-intervalls and
> then use the same 1D-indexing?
Nope, for 2D an RTree is used.

> You also mentioned a notion like "O(4n)". The Big-O -notation
> does not allow for constant factors. What do you mean with to
> express with this?
Nitpick: The notation allows for constant factors, they just don't 
matter. I.e. O(5n) = O(n).
I should rather have written something like 5*O(n), that probably would 
have been less confusing. The intent was to indicate that the given task 
took 5 times as long as would have been expected if scaling linearly.

CU Sascha

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 481 bytes
Desc: Digital signature
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20081022/6845c667/attachment.pgp>

More information about the dev mailing list