[OSM-dev] New OSM binary fileformat implementation.
frederik at remote.org
Sun Aug 1 22:24:33 BST 2010
Brett Henderson wrote:
> I'll help incorporate this into the rest of Osmosis. There's a few
> things to work through though.
> * Is there a demand for the binary format in its current
> incantation? I'm not keen to incorporate it if nobody will use it.
I run a nightly job at Geofabrik which currently operates on plain
(uncompressed) OSM files and goes roughly like this (every step uses
* apply daily diff to planet file
* split planet file into continents
* split each continent into countries
* split some countries into smaller units
* split some smaller units into even smaller units
* bzip2 the lot
The whole job takes from ~ 22h at night to ~ 9h in the morning, even
though I'm ignoring the US.
A lot of time is spent just reading from, and writing to, disk and
parsing XML. Running the whole thing with .gz files doesn't make a big
difference - saves some disk i/o, adds some CPU time, doesn't change XML
I wanted to test-drive the binary format as a replacement for raw .osm
files in this setup, hoping that it would give me the i/o benefits of
gzip compressed data but also slash XML parsing time. The numbers that
have been posted seemed promising. I might even be able to skip the
bzip2 step at the end if the binary format should become widely used,
just placing binary files on the server; and use the saved time to
re-introduce US extracts.
So here's one user who's definitely in for it - the reason I asked right
now was that I was planning to have a go at it in the near future, and
wanted to make sure that I'm not using an old version or going down a
path that everyone else already discarded. - If there's "proper"
integration with Osmosis around the corner then I'd wait for that.
The way I understood it, Scott was re-using some code he placed inside
the Osmosis tree from within his "splitter" code. Also I could imagine
that using this fance Google library means you'll have some format
description files which might be shared across all projects using that
library, perhaps even including the C++ reader that jamesmikedupont has
built, but I'm not sure.
I prefer SVN over git for the simple reason that I only have to "svn up"
and everything is there but I'm sure it is going to be a matter of
minutes before someone from Iceland points out that the same convenience
can be had with git if one knows what they're doing ;)
Frederik Ramm ## eMail frederik at remote.org ## N49°00'09" E008°23'33"
More information about the dev