[OSM-dev] New OSM binary fileformat implementation.

Frederik Ramm frederik at remote.org
Sun Aug 1 22:24:33 BST 2010


Brett Henderson wrote:
> I'll help incorporate this into the rest of Osmosis.  There's a few 
> things to work through though.
>     * Is there a demand for the binary format in its current
>       incantation?  I'm not keen to incorporate it if nobody will use it.

I run a nightly job at Geofabrik which currently operates on plain 
(uncompressed) OSM files and goes roughly like this (every step uses 

* apply daily diff to planet file
* split planet file into continents
* split each continent into countries
* split some countries into smaller units
* split some smaller units into even smaller units
* bzip2 the lot

The whole job takes from ~ 22h at night to ~ 9h in the morning, even 
though I'm ignoring the US.

A lot of time is spent just reading from, and writing to, disk and 
parsing XML. Running the whole thing with .gz files doesn't make a big 
difference - saves some disk i/o, adds some CPU time, doesn't change XML 
parsing overhead.

I wanted to test-drive the binary format as a replacement for raw .osm 
files in this setup, hoping that it would give me the i/o benefits of 
gzip compressed data but also slash XML parsing time. The numbers that 
have been posted seemed promising. I might even be able to skip the 
bzip2 step at the end if the binary format should become widely used, 
just placing binary files on the server; and use the saved time to 
re-introduce US extracts.

So here's one user who's definitely in for it - the reason I asked right 
now was that I was planning to have a go at it in the near future, and 
wanted to make sure that I'm not using an old version or going down a 
path that everyone else already discarded. - If there's "proper" 
integration with Osmosis around the corner then I'd wait for that.

The way I understood it, Scott was re-using some code he placed inside 
the Osmosis tree from within his "splitter" code. Also I could imagine 
that using this fance Google library means you'll have some format 
description files which might be shared across all projects using that 
library, perhaps even including the C++ reader that jamesmikedupont has 
built, but I'm not sure.

I prefer SVN over git for the simple reason that I only have to "svn up" 
and everything is there but I'm sure it is going to be a matter of 
minutes before someone from Iceland points out that the same convenience 
can be had with git if one knows what they're doing ;)


Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00'09" E008°23'33"

More information about the dev mailing list