[OSM-dev] Timestamp in PBF files

Jochen Topf jochen at remote.org
Thu Nov 22 10:08:46 GMT 2012


On Wed, Nov 21, 2012 at 05:16:12PM -0600, Scott Crosby wrote:
> On Wed, Nov 21, 2012 at 3:46 AM, Jochen Topf <jochen at remote.org> wrote:
> 
> > On Tue, Nov 20, 2012 at 09:17:59PM -0600, Scott Crosby wrote:
> > > Not quite. The granularity of timestamps can go down to the milliseconds.
> > >
> > >
> > https://github.com/DennisOSRM/OSM-binary/blob/master/src/osmformat.proto#L96
> >
> > Ugh. Yes. That was always somewhat of a problem in the protocol IMHO.
> > Nobody
> > needs more granularity than seconds because the main database doesn't have
> > it.
> > Similar for the latitude/longitude granularity. Nobody uses that. And it
> > just
> > makes all the code reading PBF files a bit more complex and a bit slower.
> >
> 
> Today the database lacks those features, but the future can be different.
> The trivial complexity of that feature in readers allows many possible
> future features, without a breaking format change. The ones I had in mind
> were:
> 
>     Lower granularity makes it easy to create lower-precision excerpts that
> are smaller to send and easier to store.
>     Allow OSM tooling to handle contour lines, or other grid-specified
> data, where making the granularity size matching the grid size can lead to
> vastly improved compression.
>     Support future higher-precision data, e.g., generated from GPS block
> 3 satellites.
>     Millisecond timestamps are much easier to use as unique changeset ID's
> than second-granularity timestamps.

On the other hand it is rather unlikely that OSM will make those changes to its
database anytime soon or that PBF is used for non-OSM data like contour lines
(because there are better formats and tools for that). Having functionality
that nobody actually uses means it is probably not implemented universally and
properly (Markus already mentioned he doesn't implement them). In the best
case software that doesn't implement it at least checks for it and complains,
in the worse case there is some buggy code that never gets checked because
nobody ever uses it so that if and when we actually use those features we
can't rely on the software anyway. And we have changed the PBF format before
and are in the process of changing it again, so it is not such a big deal to
add support for these things later if they are actually needed.

Oh well, this is rather academic, because I am not proposing we change the
format now. I'd only do that if we have a larger overhaul of the format.

> The runtime cost of this is a couple of multiplications that loop-invariant
> code motion can remove; about 30 nanoseconds for each 8000 entity block,
> and is much much cheaper than the branch prediction failures of VarInt
> decoding.

I use ints internally in Osmium for the lon/lat as does PBF. But there is this
conversion in there and depending on the granularity factor I am not sure I can
actually do that using just integers. I don't want to use doubles though. So
this might break on some granularity factors, I don't know and I never tested
it.  I actually use a int to double conversion before the factor is applied and
later convert back to int. And in the usual case for OSM I don't do this double
conversion at all, I just use the int as is because it has the right
granularity factor anyway. This extra check (one if that can be perfectly
branch predicted because it never changes) makes the reading of the whole PBF
file about 1% faster! double/int-conversions are slow. So even this seemingly
small thing mean I spent too much time thinking about it and writing code I am
not sure is perfectly right. :-(

Jochen
-- 
Jochen Topf  jochen at remote.org  http://www.remote.org/jochen/  +49-721-388298



More information about the dev mailing list