[OSM-dev] Timestamp in PBF files

Brett Henderson brett at bretth.com
Sat Nov 24 10:03:03 GMT 2012


Hi Markus,

On 24 November 2012 00:04, <marqqs at gmx.eu> wrote:

> Hi Brett,
>
> > *If* this information is intended to be used as an input into replication
> > processes then the sequence number is essential.  Osmosis writes a
> > timestamp in the state.txt file, but it only for identifying the right
> > sequence number to begin replication with.  All replication processing
> > requires the sequence number.  Attempting to use a timestamp is
> > theoretically possible but it's much less efficient and not how it was
> > supposed to work.
>
> I think this is true for database based updates, however the sequence
> number is not really needed for file based updates we're presently talking
> about:
>
> For example, osmupdate downloads all change files, starting with the
> newest, going back in time until the the change file has been downloaded
> which is newer than the planet files timestamp. Then all these change files
> are merged to one big change file which is then applied to the planet file.
>

Yep, that will work for patching planet files.  The replication tasks in
Osmosis can't operate that way though.

The existing --read-replication-interval allows limits to be specified to
restrict the amount of changesets downloaded at a time.  This allows a
local database to catch up in smaller steps if it is a long way behind.
Catching up in smaller steps is preferable in this case because it deals
better with the odd failure in processing (it's very frustrating to
download weeks of changes only to fail near the end and have to start
again), and because it prevents transaction sizes from growing unbounded.
Having to wait several days for one huge catchup transaction to be
processed is far from ideal, it's preferable to catch up in smaller steps.

For patching planet files it's less of an issue because you'll almost
always want all available changes to be applied, and because the number of
files being downloaded will be much less (you'll typically be using daily
or hourly files, not minute files) therefore you'll be less likely to run
into an intermittent network connectivity problem, and patching a file is
extremely unlikely to throw errors unless you run out of disk space or have
a system crash.

One other thing worth mentioning is that timestamps are not guaranteed to
increase for every change file.  In practice for anything down to minute
files you're unlikely to see any issues, but if the database server clock
skews for any reason there's nothing to prevent time running backwards.
This could lead to consumers relying on timestamps to miss data.  Sequence
numbers on the other hand are guaranteed to always increase per change file.

This is all a bit academic for patching planet files, but Osmosis doesn't
make any assumptions about how short the changeset intervals are, or what
is consuming changes at the other end of the pipeline.

I could create a new task optimised for patching planet files, and perhaps
that's what I (or somebody else if they wish to step in) will need to do if
we embed replication information into PBF files, but it will have to remain
separate from --read-replication-interval, so there'll be more code to
maintain.  I'm not opposed to it if it makes users lives easier though.

In summary, I'd prefer to keep using sequence numbers if possible because
it allows me to re-use more existing replication code, but it wouldn't be
impossible to do without them.


> Osmosis may work differently, and it may need the sequence number to start
> this kind of file update - I really don't know. But if so, I totally agree,
> we should make it possible to store sequence numbers in PBF files.
>
> Could also be done with the key-val format I suggested...
>

Cool, I don't have any strong opinions on how the information should be
stored.  I'm happy to leave that in the hands of those more familiar with
the PBF format.

Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20121124/784d650e/attachment.html>


More information about the dev mailing list