[OSM-dev] visible-Flag in PBF

Sat May 7 04:46:26 BST 2011

On Fri, May 6, 2011 at 9:24 PM, Christian Vetter <veaac.fdirct at gmail.com> wrote:
> On Sat, May 7, 2011 at 3:58 AM, Scott Crosby <scott at sacrosby.com> wrote:
>> Yes, but via a different mechanism. Create a new blob type:
>> 'OSMInvisibleHistoryData', it will contain a serialized ordinary PrimitiveBlock
>> message. In this block you'll place invisible nodes/ways/relations. Standard
>> readers will see this block and skip right past it. Your reader will feed it to the
>> normal pbf decoder function and have it mark those entities as invisible. You'll
>> want to indicate these files with a new optional_feature 'ContainsInvisibleHistoryData'.
>
> I like the idea. You can heave InvisibleNodes / InvisibleWays /
> InvisibleRelations / InvsibleDenseNodes blobs. Their encoding  /
> .proto would be exactly the same as for the normal blobs.

I don't see a need to define 4 new blob-types as PrimitiveBlocks messages
already contain all of those types. Just one new type
OSMInvisibleHistoryData. What the
motivation otherwise?

> Readers not
> interested in the data can skip invisible objects, existing readers
> would always skip them ( they do not recognize the blobs ).

There's another way of doing that....

>
>> What else do people want to consider putting in? Anything else that would
>> help in recording history? Consider adding LZMA? Someone want to run a test?

One idea: Each BlobHeader includes an 'indexdata' field. We could
store a protocol buffer message there. It is available without parsing
the blob. It needs to be kept small, but adding something like:

message IndexData {
   optional int32 nodecount = 16
   optional int32 waycount = 17
   optional int32 relationcount = 18
}

to each block would only increase the size of a planet by about a
megabyte. There have been other requests for a feature like this.

Can you and others who want this feature please make a convincing case
of why every planet should get about 1mb bigger?

>
> A small wish list ;-)
>  - dense ways / relations

DenseNodes make sense for two reasons. They reduce the protobuf header
overheads (of about 30% before compression), and they allow delta
coding for id&lat&lon. Nodes also account for >90% of the entities and
most of them have no tags. Ways and relations are individually much
larger, so header overheads matter less, and I already delta compress
their member id's.

>  - optional feature: ordered ( nodes -> ways -> relations ); actually
> some readers already assume this

This is already documented in the Wiki. (See OSMHeader section). The
current osmosis writer doesn't set that feature because AFAICT, there
is no functionality in osmosis for the writer to know that the file is
sorted so that it knows to set that flag.

>  - if ordered: position of first node / way / relation block

There's no elegant way to include this information within a pbf file.
It could be stored in an external index file.

There is an alternative approach with a similar effect. If each blob's
indexdata contains the 'nodecount', 'waycount' and 'relationcount'
fields above, a reader only needs to decompress/decode blocks that
contain entities of the type it wants. Skipping unwanted blocks is
extremely fast.

>  - optional feature: polygon / bbox used to filter the data set
>  - name / id / url / ... of the data set?

Definitely possible. In fact, I'd propose to put a full-fledged
key-val dictionary into the HeaderBlock message. (In XML, it would be
serialized into the <bounds> message.)

I've tried to keep PBF as simple as possible by making a conscious
effort to avoid specifying features unless I could make immediate use
of them. Sort flags and additional metadata can be easily specified
and added as soon as external software is ready to send them.  (HINT
HINT HINT)  The few counterexamples to this were things that were dead
simple, lead to higher compression, and would would be impossible to
retrofit in later. (granularity, lat_offset and lon_offset)

Speaking of granularity. Is anyone using that feature to make smaller
pbf files by increasing the granularity from 1cm to 1m?

>  - include .proto definitions in the file format? -> self describing;
> might be of interest of we ever break compatibility; or if you want to
> read PBF files that were extended by a writer and don't want to hunt
> down the definitions

Not useful. Without knowing the semantics of the message, I don't see
any way to parse it. For instance, how would such a self-describing
parser know which fields were delta-coding? Or used the keysvals
encoding used in DenseNodes?

>
> With regard to LZMA: I have some C++ code lying around to compress /
> decompress LZMA... I can test how much it would affect file size /
> decoding speed.

Cool. You don't need a full-fledged PBF reader&writer to test it. Just
enough to parse out blobs and write blobs.

Scott