[OSM-dev] change replication and replication lag

Wed Mar 10 08:47:29 UTC 2021

Hi,

On Tue, Mar 09, 2021 at 11:27:02PM +0100, Stephan Knauss wrote:
> pyosmium doesn't do this and stores only the sequence ID.
> 
> I could do a request of the matching state file and grep for the timestamp,
> but this sounds a bit excessive:
> 
> # wget -q -O -
> https://planet.openstreetmap.org/replication/minute/004/448/577.state | grep
> timestamp
> timestamp=2021-03-09T22\:14\:57Z
> 
> What are others using? switch2osm still refers to a mod_tile supplied script
> using osmose.

pyosmium-get-updates is just a thin wrapper around pyosmium's replication module.
So I have usually written my own Python script to store replication information
somewhere else (usually in the database for convenience).

That's what Nomiantim uses:
https://github.com/osm-search/Nominatim/blob/master/nominatim/tools/replication.py

And for osm2pgsql, we supply now a similar script:
https://github.com/openstreetmap/osm2pgsql/blob/master/scripts/osm2pgsql-replication
The script is version independent, so you can just download it and use it with
whatever osm2pgsql you are running. The replication state including the timestamp
is saved in a table planet_osm_replication_status. You can compute the lag easily
from that.

> Would it sound feasible to extend pyosmium to store the full server state
> file instead of just the sequence? Tools used to the osmose behavior would
> then still work.

I've explicitly avoided using the state files because their format is buggy,
in particular the timestamp which can't just be given to date parsing libraries.
However, I see your point about providing osmosis compatibility.

> Pyosmium already fetches the "newest" state file. As long as diffs fit
> within size, this is then the state file which could be persisted. Otherwise
> it would require to fetch the state file of the then resulting sequence
> number.
> 
> Any ideas regarding this before filing an extension request?

The state file that pyosmium saves in less than optimal. The initial idea was
to keep it as simple as possible but that turned out to make the whole update process
more complicated. It should save at least the replication source because the
sequence ID is always tied to a server. Once we extend the format for that, it
makes perfect sense to also store the timestamp.

Sarah