[OSM-dev] visible-Flag in PBF
scott at sacrosby.com
Sat May 7 22:43:11 BST 2011
On Sat, May 7, 2011 at 4:04 PM, Christian Vetter <veaac.fdirct at gmail.com> wrote:
>> There are about 80k blobs. If 1-byte tags are used for the counts,
>> overhead is:9 bytes each:
>> 2 bytes indexdata tag&length in the BlobHeader
>> 3*1 bytes (tags for 3 fields)
>> 2*1 bytes (varint count for N==0)
>> 1*2 bytes (varint count for N < 2**14)
>> I assume that few blobs contain more than one entity type. Using
>> booleans only saves one byte of overhead compared to this.
> I believe we can get away with 4 bytes:
> 2 bytes tag + length
> 1 + 1 byte for one field ( bool )
> We omit all fields that equal zero ( they are optional ) and the
> reader can then treat that as if it were set to zero
I think that this is a bad idea, because then you can't easily
distinguish between a count of zero and files written by a program
that doesn't set a count.
>>> About 312s to compress all blobs for Germany. Changing the dictionary
>>> size does not change much. I lowered it all the way down to 64kb and
>>> the values stayed the same essentially.
>> And deflate?
Here's the tradeoffs: Lzma is about twice as slow as deflate to
compress and 10% smaller. Decompression should be a little slower than
deflate. Is that worth adding a LZMA dependency to any PBF reader?
My verdict is no. Protobufs and deflate have extensive language
support, LZMA doesn't and may be superseded by XZ.
Anyone want to make a compelling case for LZMA? Stefan?
More information about the dev