[OSM-dev] Timestamp in PBF files

Wed Nov 21 10:50:38 GMT 2012

Hello,

> How many nodes in the planet lack a latitude or longitude? Using a MAXINT
> encoding will cost about 8 bytes for each missing latitude or longitude.
>  It's possible to reduce this to 2-3 bytes, but the format gets
> uglier/hackier. IMHO, probably not worth that cost.

As far as I understood, only nodes with the attribute action=delete do not have (resp. do not need) lon/lat. On the other hand, it does not hurt to give them false lon/lat values. This is what osmconvert does when you apply the --fake-lonlat option.

In PBF, lon/lat are delta coded, aren't they? Thus it would be best to write a delta of 0, i.e., to take the logical value of the previous node. A few steps later in the toolchain lon/lat values of action=delete objects will be deleted anyway (together with their objects).

> It should have a planet URI (or a planet URI and a list of mirrors) of
> what planet it corresponds to. That way a user merely needs to say
> 'update planet' and everything else can be automated.

Please don't. These data aren't necessary. Same applies to sequence numbers.

Since a year or so planet files can be updated by a single "update" command. This command first determines the age of the old file, then it downloads all needed planet change files, starting with the newest and ending with that change file which has been published right after the file timestamp of the old planet file.

Syntax:
https://wiki.openstreetmap.org/wiki/Osmupdate#Updating_OSM_Files

Since the state.txt files from osm planet's server have to be parsed in the process anyway, there is no need to include them into PBF.

> No status, but if anyone wants my opinion, when authoring the format, I
> expected us to add metadata to planets, and expected it to be put into
> OSMHeader as in the OSRM clone you linked to above. I would vote to
> deprecate the use of the ISO timestring encoded into the optional_features
> array, but continue to write to it to avoid breaking old installs of
> Marqqs's tools.

OK, this seems to be consensual: PBF id 18 in the header block for a signed int UNIX timestamp value.

I will implement the appropriate read function in osmconvert at once.

For reason of compatibility osmconvert will _write_ both file timestamp representations, the UNIX based _and_ the string based. There may be some tools which depend on the format we have used for a year now.

> Ugh. Yes. That was always somewhat of a problem in the protocol IMHO.
> Nobody
> needs more granularity than seconds because the main database doesn't have
> it.
> Similar for the latitude/longitude granularity. Nobody uses that. And it
> just
> makes all the code reading PBF files a bit more complex and a bit slower.

I totally agree. osmconvert even cannot read any PBF files which do not use standard granularity. It rejects these files with an error message. No one has ever complained! Thus I guess nobody really needs this option.

Besides, the format definition we have is kind of unfortunate: the granularity values may come _after_ the lon/lat values they refer to. This makes it necessary to process every data in a data block twice: first parse it and - in a second run - apply the granularity factor.

> > And what I am saying is that we should think this through so that we
> don't
> > have the same problem again tomorrow.
> 
> Then please think it through quickly and post the results ;)

Done. Any objections? ;-)

Markus