[OSM-dev] New OSM binary fileformat implementation.
Stefan de Konink
stefan at konink.de
Sun Aug 1 23:00:42 BST 2010
On Sun, 1 Aug 2010, Frederik Ramm wrote:
> A lot of time is spent just reading from, and writing to, disk and parsing
> XML. Running the whole thing with .gz files doesn't make a big difference -
> saves some disk i/o, adds some CPU time, doesn't change XML parsing overhead.
I'm sorry but the parsing overhead from Java or libXML basically a known
slowless factor. MSXML, pre/post plane parsing or even custom readers are
not slow, and only limited to the disk.
So the binary format, per se, is only faster because:
- smaller filesize = less io
- encoding: no xml rewriting
Anything else is currently available using for example osmsucker.c,
obviously not using an XML parser because all input is structured.
If the binary format can pack our doubles (lat/lon), integers
(version/ids) and makes strings available in UTF-8, that skips CPU and IO
overhead. But makes the data not human readable. I can totally live with
that, and I hope the API protocol also gets protocol buffers.
More information about the dev