[OSM-dev] visible-Flag in PBF

Sat May 7 22:04:16 BST 2011

Hi,

On Sat, May 7, 2011 at 10:15 PM, Scott Crosby <scott at sacrosby.com> wrote:
> However, I'm reluctant to add any of these unless there's a commitment
> to immediately add software support for these.

And I guess you are right to do so. While it might parsing the data
faster, it usually is not the bottleneck when working with OSM data.
E.g., the importer of MoNav spends 50s reading the PBF and the whole
preprocessing chain takes about 5-10 minutes. Keeping the data format
simple is also very important.

> Most likely. We can always add another field to the indexdata. What
> kind of indexing data structure are you thinking of?

Similar to the once you mentioned above... element id range and bounding box.

> There are about 80k blobs. If 1-byte tags are used for the counts,
> overhead is:9 bytes each:
>
>   2 bytes indexdata tag&length in the BlobHeader
>   3*1 bytes (tags for 3 fields)
>   2*1 bytes (varint count for N==0)
>   1*2 bytes (varint count for N < 2**14)
>
> I assume that few blobs contain more than one entity type. Using
> booleans only saves one byte of overhead compared to this.

I believe we can get away with 4 bytes:
2 bytes tag + length
1 + 1 byte for one field ( bool )
We omit all fields that equal zero ( they are optional ) and the
reader can then treat that as if it were set to zero

> The problem is that osmosis doesn't have functionality to 'push'
> metadata through the pipeline. The pbf reader, as a consumer, has
> barely any metadata available to put in the file header. It has no
> idea if the data is sorted or not so can't set the flag. The solution
> is to improve the XML format so that bounds tags include a key-val
> dictionary, make osmosis propagate that metadata through its
> processing, and have the xml&pbf writers put that metadata into the
> OSMHeader blob or <bounds> XML tag. Existing osmosis filters need to
> be patched to properly set or reset the is-sorted flag. Then ask the
> people who do planet dumps to set the flag on the dbase dumps. (And,
> given we'll have a full key-val dictionary, we can include other
> metadata like URL's for getting minute changesets, datestamps, and
> other goodies.)

I agree, it would be the right approach to get this information in the
XML version first.

>> About 312s to compress all blobs for Germany. Changing the dictionary
>> size does not change much. I lowered it all the way down to 64kb and
>> the values stayed the same essentially.
>>
>
> And deflate?
>

185s

Regards,

Christian Vetter