[OSM-dev] New OSM binary fileformat implementation.

Frederik Ramm frederik at remote.org
Thu Apr 29 08:15:59 BST 2010


Scott,

Scott Crosby wrote:
> I would like to announce code implementing a binary OSM format that
> supports the full semantics of the OSM XML. 

This all sounds very interesting, and you seem to have spent a lot of 
thought on it and documented it well.

If I understand it correctly, this is meant to be a replacement for the 
XML files as a "transport format" for XML data. It is not meant to offer 
random access in any way, and thus differs from other attempts at 
creating binary formats that could be used in lieu of databases, having 
indexes and all.

Maybe we should be careful about naming these formats to make their 
purpose clearer. The generic "OSM binary format" seems to mean different 
things to different people. The file extension ".bin" is perhaps not the 
best choice.

Have you considered/evaluated "Fast Infoset" and if so, what were the 
reasons against that?

> It is 5x-10x faster at
> reading and writing and 30-50% smaller

The size figure is obviously compared to bz2; is the "5x-10x faster" 
also compared to bz2, and if so, compared to the native Java bz2 or the 
external C one?

> an entire planet, including
> all metadata, can be read in about 12 minutes and written in about 50
> minutes on a 3 year old dual-core machine. 

How did you measure write performance decoupled from read performance? 
Surely your 3 year old dual-core machine did not have the 150 gigs of 
RAM needed to suck the entire planet into memory?

You have paid an impressive amount attention to details in order to 
achieve the good performance and compression rates that you do. I'm 
slightly concerned about the robustness of it all - in the past, we 
often had planet files that were broken one way or the other, and it was 
usually possible to remedy this with some standard grep, sed, or dd 
actions - if one of your files ever breaks then I guess it is likely to 
be complete garbage ;-)

> Probably the most important TODO is packaging and fixing the build system.
> I have no almost no experience with ant and am unfamiliar with java
> packaging practices, so I'd like to request help/advice on ant and 
> suggestions on
> how to package the common parsing/serializing code so that it can be
> re-used across different programs.

I suggest to ask on osmosis-dev, an get your new code into the Osmosis 
trunk quickly so people can play with it.

Bye
Frederik





More information about the dev mailing list