[OSM-dev] Binary OSM; the first pass encoder

Stefan de Konink stefan at konink.de
Sun Nov 9 13:23:34 GMT 2008


Marcus Wolschon wrote:
> Stefan de Konink schrieb:
>> The two problems I see; - In order to allow updates there should be
>> some form of 'updatable space'. - If this space is not present it
>> might be good to have one file that contains *all* strings, another
>> one that contains the rest of the data, and maybe a final
>> client-side generated index on both of them.
> I do this with large-strings(Strings with <32 characters are inline)
> and separate files for nodes, ways, indexes, ...
> I thought about storing all strings externally but ended up noting
> that most tags are not very long.

The point is not that tags are long but that all keys are duplicates :)

> My purpose is not minimal bandwith for transmission but a good on-disk
> - -format, so my metrics are different
> from yours.

True. Don't forget that my first pass encoding, is like you want it to 
be. Maybe you could skip 'users' and 'timestamps' from the data. This 
would significantly reduce the amount of data.

> No. Attributes that are longer then the 32 bytes are stored in an
> external file.
> Nearly all attributes fit in here, so most accesses require no
> additional seek.

Ok :) Sounds like Paradox :) :) :) Good!

>>> I am trying something similar but with fixed length records and
>>> back-links from node to way to allow updates to be applied to the
>>> file. 1.4GB-135MB is nice but you still don't want to download
>>> 135MB every day to have an up-to-date netherlands-file (let alone
>>> to do this for the planet).
>> 135MB (gzip XML) -> 78MB (bzip2 bin)
>>
>> We are just looking at the possibilities to binary diff the files,
>> just to allow partial updates. By XORing them on the source present
>>  at the user.
> 
> Interesting. It could make a good download-format.
> I'm looking forward to seeing this happen. :)
> How do you intend to handle the boundary-rectangles for diffs
> if a user does not store all the world on e.g. his small nettop?
> Use binary-files and diffs per country?

Binary files per country sounds the most reasonable thing to do. The 
other problem that the file only will grow bigger, and get fragmentation 
problems, is something else. So we might have to implement a search and 
reorder every 3 months.


>>> I was quite occupied with another open-source-project of mine and
>>>  switching jobs but now I should have the time to finish
>>> implementing my own  proposal in code and test it's performance.
>> :) good luck :) if you want to team up to write the ultimate code,
>> just send a private mail.
> I just started implementing the memory-mapped io-code for  my nodes.obm.
> First I want to get this brainchild of mine going and then  implement
> a Java-parser
> for your format for binary-downloads.
> This could get really fast. :)

At #osm-nl we ar discussing the float -> long thing. I used floats 
because it (obviously) allows more precision, but I agree on some points 
mentioned before.


Stefan




More information about the dev mailing list