[OSM-dev] Binary OSM; the first pass encoder
Stefan de Konink
stefan at konink.de
Sun Nov 9 04:33:35 GMT 2008
Hi All,
Because I am getting more and more disappointed with the current state
of affairs with respect to the downloading of OSM content some people on
the Dutch OSM IRC channel thought of an alternative way of distribution
that could potentionally get binary diffs after any possible download in
the past.
I wrote the first implementation of it in the last couple of hours and
tested it on the Dutch dataset. The current gzip compressed data is
about 135MB. Extracted it represents 1.4GB of XML.
The binary file is completely analogue to the XML, no shortcuts what so
ever. The first reduction to binary format containing only data reduced
the set to 418MB and allows a bzip2 compression to 78MB.
In principle it is nothing more than:
N [long id] [float lat] [float lon] [time_t timestamp]
[uint length of userfield] [non terminated userfield]
And likewise for the other subtries.
As discussed before; it is possible to do a second pass binary encoding
with all strings in a distinct table. Where the linked list can be
recovered to an array can be recovered from the storage. This would make
a significance difference for the tag keys alone.
In this case all string fields can converted to unsigned long fields for
now 4G of distinct fields seems enough :)
If interested taking a peak is possible at;
http://repo.or.cz/w/handlerosm.git?a=tree;f=osmbinary;h=1701a9194285a56e7a91536def314fb8b2e95350;hb=96c7b81af692df89bc6c5eba999e9bb61c92323c
Stefan
More information about the dev
mailing list