[OSM-dev] New OSM binary fileformat implementation.

Stefan de Konink stefan at konink.de
Sun Aug 1 23:00:42 BST 2010


On Sun, 1 Aug 2010, Frederik Ramm wrote:

> A lot of time is spent just reading from, and writing to, disk and parsing 
> XML. Running the whole thing with .gz files doesn't make a big difference - 
> saves some disk i/o, adds some CPU time, doesn't change XML parsing overhead.

I'm sorry but the parsing overhead from Java or libXML basically a known 
slowless factor. MSXML, pre/post plane parsing or even custom readers are 
not slow, and only limited to the disk.

So the binary format, per se, is only faster because:
  - smaller filesize = less io
  - encoding: no xml rewriting

Anything else is currently available using for example osmsucker.c, 
obviously not using an XML parser because all input is structured.


If the binary format can pack our doubles (lat/lon), integers 
(version/ids) and makes strings available in UTF-8, that skips CPU and IO 
overhead. But makes the data not human readable. I can totally live with 
that, and I hope the API protocol also gets protocol buffers.


Stefan



More information about the dev mailing list