[OSM-dev] Compression types in PBF Format

Stefan de Konink stefan at konink.de
Wed Dec 1 01:29:34 GMT 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hi Scott,


Op 01-12-10 00:41, Scott Crosby schreef:
> The real question is does supporting bzip2/lzma offer advantages that
> are commensurate with the added implementation complexity, not just in
> pbf2osm but in every other reader too.

If any of gzip/bzip2/lzma in the general give better compression ratio's
(20% smaller), then this compression scheme should become the default
format. Since (sadly) PBF goes into an 'archival' format opposed to a
wire format.


> Would you be willing to run an experiment with LZMA? If it shaves a
> gigabyte off of the planet, then I'd say its worth further
> consideration; if it shaves 100MB, then its not. Make a case for why
> it should be included.

I completely agree. But experimenting with LZMA means first a osm2pbf
that supports LZMA. And currently I feel that the only 'true' tool that
should do something like this should be named pgsql2pbf. I honestly
cannot find a single reason why it would be good to use the XML as
intermediate format, except for legacy support.


>>> Excuse me, but discussing potential problems of a design is not a show
>>> of lack of respect - unless presented in a form like the aforementioned
>>> "osmosis devs failed to read the specs".
>>
>> Oh dear, so because I actually feedbacked on Scott and asked questions,
>> and verified my code and implemented the specs I cannot complain osmosis
>> didn't?
> 
> You do realize that *I* designed the format AND wrote the spec AND
> wrote the osmosis reference implementation?

No, I didn't. But my archive also states that James Michael DuPont also
published his OSM-Osmosis version.

<indepth>
And for the reader; that only was presented my flame to the osmosis
implementation;

Out of the blue the OSMOSIS implementation started to introduce -1
userid's, this is in no place documented, neither is it a default at
present to represent past anonymous edits with a negative userid.
Especially since at that time the uid's couldn't be negative (by spec)
and the format specifies 'has_uid'.
</indepth>


> That means that if there are any errors or omissions in that
> implementation or spec, they are my mistakes. If there is an
> ambiguity, then I have made the call as to what is right. If there are
> any differences between the spec, reference implementation, and the
> conceptual design, I'm the one resolving the conflict and determining
> the best way to fix the issue.

Since the current osmformat.proto still has a int32 for a uid, which is
in fact always positive number in the openstreetmap database, the
problem has been reported before. Would be obvious to haven't defined it
at all in message Info and use 0 in DenseInfo.


> I do appreciate you finding the bugs and ambiguities in the spec by
> being the first independent implementation, and I hope you will
> consider running the LZMA experiment, but you have been rude and
> insulting.

Basically you are asking me to run tests that Jochen should have come up
with to prove that your specification of multiple compression formats
sucked. I find this insulting. I think your choice is sound, and if a
tool doesn't implement compression scheme X, then just inform the user.

And if you found my comment in the code rude and/or insulting, I would
have expected an email of you in private about two months ago, because
honestly something by-far more rude was written there.

But again, nobody seems to care what happens here or what is written. It
is not strange that a flamewar over a format starts seven months after
initial publication or that a pointing fingers at code starts about two
months after the publication of it. I do find it interesting someone
actually bothered to read the code, sadly I cannot speak about any broad
collaboration.


Stefan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEAREKAAYFAkz1pP4ACgkQYH1+F2Rqwn2JHQCbBbYJN0EiYFCgtF2bQCP+CsVm
MA8AnjrA8bV/Tk8JE9KnqB78xwm6ma+b
=X7fJ
-----END PGP SIGNATURE-----



More information about the dev mailing list