[OSM-dev] PBF for History-Files (was: visible-Flag in PBF)

Jochen Topf jochen at remote.org
Tue May 10 17:55:39 BST 2011


On Mon, May 09, 2011 at 05:21:40PM -0500, Scott Crosby wrote:
> On May 8, 2011 5:21 AM, "Jochen Topf" <jochen at remote.org> wrote:
> > Thinking about the inclusing of the invisible-Flag I think we have to step
> > back a bit:
> >
> > There are two use-cases: One is reading an OSM file with no history
> > information. Thats what most people do. The other is reading an OSM file
> with
> > full (or in the future maybe partial) history information. Those are two
> very
> > different things and the application has to be aware of which kind of file
> it
> > is reading. History information does not just mean that there is now an
> > invisible flag, it means the other data has to be interpreted differently.
> For
> > instance the ID of an object is no longer unique.
> >
> > So it doesn't make any sense to have a normal OSM file with optional
> history
> > information that some applications would read and others don't. I think an
> > approach with blocks that interleave current information with historic
> > information which would allow a reader that only nows about current OSM
> data to
> > skip some blocks, doesn't make sense.
> 
> Sounds like you and Peter are for putting the invisible flag directly into
> the data. I concur. A user is unlikely to download a dataset with otherwise
> unnecessary historical information unless they have a software stack
> prepared to handle it.
> 
> > In a way "current OSM" and "history OSM" are two very different formats.
> They
> > share ther basic building blocks, like id, version, timestamp, etc. plus
> one
> > visible flag thats only needed for the history OSM. But the semantics of
> this
> > data is different.
> 
> > We could now have to different kinds of blocks for every object type. One
> for
> > current OSM data and one for historic OSM data. The only difference would
> be
> > the existence of the invisible flag. No PBF file would ever contain both
> types
> > of these blocks. But that seems like a lot of overhead. Every reader
> library
> > now has to parse both those kinds of blocks although they are nearly the
> same.
> 
> No.
> 
> > If we just add the visible flag the low-level reader libraries for current
> > and historic OSM data are the same, only upper levels of an applications
> > have to take these differences into account.
> >
> > So I propose the following:
> >
> > 1. Add the visible flag as I have already proposed:
> > =========================================
> > --- a/src/osmformat.proto
> > +++ b/src/osmformat.proto
> > @@ -126,6 +126,7 @@ message Info {
> > optional int64 changeset = 3;
> > optional int32 uid = 4;
> > optional uint32 user_sid = 5; // String IDs
> > + optional bool visible = 6 [default = true];
> > }
> >
> > /** Optional metadata that may be included into each primitive. Special
> dense format used in DenseNodes. */
> > @@ -135,6 +136,7 @@ message DenseInfo {
> > repeated sint64 changeset = 3 [packed = true]; // DELTA coded
> > repeated sint32 uid = 4 [packed = true]; // DELTA coded
> > repeated sint32 user_sid = 5 [packed = true]; // String IDs for usernames.
> DELTA coded
> > + repeated bool visible = 6 [packed = true];
> > }
> 
> I agree with this in principal. I veto this implementation
> strategy. Info/DenseInfo are designed so that they could be stripped out
> without affecting geographic information. As the visibility flag is critical
> to understanding the map, it cannot be placed in these sections. I'm
> prepared to commit this patch later this week:
> 
> 
> diff --git a/src/osmformat.proto b/src/osmformat.proto
> index 44e24f7..b9c17b2 100644
> --- a/src/osmformat.proto
> +++ b/src/osmformat.proto
> @@ -165,6 +165,7 @@ message Node {
> 
>     required sint64 lat = 8;
>     required sint64 lon = 9;
> +   optional bool deleted = 11;
>  }
> 
>  /* Used to densly represent a sequence of nodes that do not have any tags.
> @@ -190,6 +191,7 @@ message DenseNodes {
> 
>     // Special packing of keys and vals into one array. May be empty if all
> node
>     repeated int32 keys_vals = 10 [packed = true];
> +   repeated bool deleted = 11;
>  }
> 
> 
> @@ -202,6 +204,7 @@ message Way {
>     optional Info info = 4;
> 
>     repeated sint64 refs = 8 [packed = true];  // DELTA coded
> +   optional bool deleted = 11;
>  }
> 
>  message Relation {
> @@ -222,5 +225,6 @@ message Relation {
>     repeated int32 roles_sid = 8 [packed = true];
>     repeated sint64 memids = 9 [packed = true]; // DELTA encoded
>     repeated MemberType types = 10 [packed = true];
> +   optional bool deleted = 11;
>  }
> 
> 
> Opinions?

It repeats information that should be defined once in my opinion.

It also reverses the meaning of the well-established "visible" flag by calling
it "deleted".

And it doesn't make sense to not have the Info message in a history file, but
have the deleted (or visible) flag alone, because you need the version
information to find out which version of an object you need. There can be
several versions of the same object with visible=true. If you want to find out
the current/last version of an object you need the one with the highest version
number if visible=true. If you want to find a version at a different point in 
time you need the timestamp.  So the visible attribute is no different than the
version and/or timestamp attributes, really. At least in the case of the
history file.

So I don't see the case where it would make sense to leave out the Info message,
but keep the visible or deleted flag.

> > =========================================
> >
> > 2. Add some kind of type field that says "This is an OSM file with current
> data"
> > or "This is an OSM file with historic data". (We should also think about
> adding
> > a "diff" file type, which is also needed, but I'll leave that for a
> different
> > discussion.)
> >
> > I am not sure whether this type field should just be a "required_feature"
> or
> > something else. Its not really a feature. But it would make sense to use
> this
> > mechanism.
> 
> I propose defining a new 'required_feature' of:
>    ContainsDeletedHistoricalInformation

Maybe better "ContainsHistoricalInformation". I think the "deleted" is
confusing, because it contains not just deleted information but also old
versions of the objects.

> The rules is:
> 
>    A file without required_feature ContainsDeletedHistoricalInformation is
> invalid if it contains anything in the 'invisible' fields. A conforming
> reader MUST reject any such file.

I'd say:
* A conforming writer MUST NOT write this field.
* A conforming reader MUST ignore this field.

That puts the burden on the writer to do the right thing and makes the reader
a) more robust and b) a little bit faster because it doesn't have to check.

>    A file with required_feature ContainsDeletedHistoricalInformation MUST
> have an invisible flag for each entity in each block.

Sounds good.

> In addition:
> 
>   Software is encouraged to support files that
> contain ContainsDeletedHistoricalInformation wherever feasible. By, for
> instance, ignoring any 'deleted' entity.

No. That is not enough to understand historical information. It is much more
complicated to deal with historical information so we should not put any rules
on this in the format description.

>   To minimize user confusion, files containing historical data will have an
> .hpbf extension.

I'll put the discussion on extensions in a different thread.

> Because of the existing rule that software must reject any file with a
> required feature it doesn't understand, old software will properly reject
> these files, whatever their filename or extension is. It will also be easy
> to adapt software to read the new format. (I'll handle osmosis.) The
> eventual hope is that once a critical mass of software knows the new format,
> I can simplify the explanation by deprecating the compatibility rules.
> 
> The reason I call these 'required_features' is  because the file requires
> that your parser must support them, otherwise you cannot parse the file.
> 
> >
> > The PBF documentation would then explain that your are not supposed to set
> the
> > visible flag on "current OSM" files and have to ignore its setting when
> reading
> > those, but you have to set and read it for "historic OSM" files.
> >
> 
> Sounds reasonable.
> 
> Scott

Jochen
-- 
Jochen Topf  jochen at remote.org  http://www.remote.org/jochen/  +49-721-388298




More information about the dev mailing list