[OSM-dev] Binary OSM; the first pass encoder
Stefan de Konink
stefan at konink.de
Sun Nov 9 13:23:34 GMT 2008
Marcus Wolschon wrote:
> Stefan de Konink schrieb:
>> The two problems I see; - In order to allow updates there should be
>> some form of 'updatable space'. - If this space is not present it
>> might be good to have one file that contains *all* strings, another
>> one that contains the rest of the data, and maybe a final
>> client-side generated index on both of them.
> I do this with large-strings(Strings with <32 characters are inline)
> and separate files for nodes, ways, indexes, ...
> I thought about storing all strings externally but ended up noting
> that most tags are not very long.
The point is not that tags are long but that all keys are duplicates :)
> My purpose is not minimal bandwith for transmission but a good on-disk
> - -format, so my metrics are different
> from yours.
True. Don't forget that my first pass encoding, is like you want it to
be. Maybe you could skip 'users' and 'timestamps' from the data. This
would significantly reduce the amount of data.
> No. Attributes that are longer then the 32 bytes are stored in an
> external file.
> Nearly all attributes fit in here, so most accesses require no
> additional seek.
Ok :) Sounds like Paradox :) :) :) Good!
>>> I am trying something similar but with fixed length records and
>>> back-links from node to way to allow updates to be applied to the
>>> file. 1.4GB-135MB is nice but you still don't want to download
>>> 135MB every day to have an up-to-date netherlands-file (let alone
>>> to do this for the planet).
>> 135MB (gzip XML) -> 78MB (bzip2 bin)
>>
>> We are just looking at the possibilities to binary diff the files,
>> just to allow partial updates. By XORing them on the source present
>> at the user.
>
> Interesting. It could make a good download-format.
> I'm looking forward to seeing this happen. :)
> How do you intend to handle the boundary-rectangles for diffs
> if a user does not store all the world on e.g. his small nettop?
> Use binary-files and diffs per country?
Binary files per country sounds the most reasonable thing to do. The
other problem that the file only will grow bigger, and get fragmentation
problems, is something else. So we might have to implement a search and
reorder every 3 months.
>>> I was quite occupied with another open-source-project of mine and
>>> switching jobs but now I should have the time to finish
>>> implementing my own proposal in code and test it's performance.
>> :) good luck :) if you want to team up to write the ultimate code,
>> just send a private mail.
> I just started implementing the memory-mapped io-code for my nodes.obm.
> First I want to get this brainchild of mine going and then implement
> a Java-parser
> for your format for binary-downloads.
> This could get really fast. :)
At #osm-nl we ar discussing the float -> long thing. I used floats
because it (obviously) allows more precision, but I agree on some points
mentioned before.
Stefan
More information about the dev
mailing list