[OSM-dev] mmap()

Marcus Wolschon Marcus at wolschon.biz
Thu Dec 4 18:46:54 GMT 2008

2008/12/4 Stefan de Konink <stefan at konink.de>:
> Marcus Wolschon wrote:
>> My reference-implementation for the osmbin-format is nearly
>> completely debugged and optimized and I was able to generate
>> statistics about the number of ways referenced per nodes,
>> distribution of tag-value-length, ... for a much larger sample.
>> This time not using the city of Hamburg but the state of
>> Baden Wuerttemberg (Germany).
>> http://wiki.openstreetmap.org/wiki/User:MarcusWolschon/osmbin_draft
>> I'll try to do all of Germany next.
> Sounds cool :) Do you have the final filesize reduction also calculated?
> Is the structure extendable? (I mean does it allow efficient network
> updates)

The file-sizes are stated in the statistics.
The Index by ID is optimal if a worldfile or large extract is used
and can grow to be larger then the nodes/ways-file for smaller
extracts where this does not matter.
The nodes.id2 has about the same size as nodes.obm since
I was able to reduce the average size of a nodes-record
dramatically using the first statistics.

The structure is mutable, meaning you can add, modify and
remove nodes, ways and attributes at will. The reference-code
does that. Thus you can use this as a native on-disk-format for
a wide range of applications and directly apply the nightly/
hourly/minutely diffs to it without re-generating the file or
downloading more then just the diff. (All code required to do
that is already part of LibOSM.)

The index can be removed and re-generated or substituted
with another kind of index at will.
(For example: I have an additional hsqldb-file with an index
of all streets, villages, house-numbers and zip-codes indexed
by their normalized name for address-searches.)

Rules exist to repair a broken file and return it into a consistent
state while loosing as little information as possible.
I want to publish an automatic repair-tool later (much later).


More information about the dev mailing list