[OSM-talk] osm2pgsql & planet: frustrations, cutoffs, and idempotence
tom at compton.nu
Mon Oct 27 00:25:35 GMT 2008
Michal Migurski wrote:
> The final event in each weekly planet dump does not fall on an even
> day boundary. In the case of the most recent Oct. 22nd planet.osm, it
> was necessary to experiment with hourly diffs from that day to find
> that the boundary was approx. 2:00pm. Hourlies up to and including
> 2008102213-2008102214.osc.gz failed, hourlies after that succeeded. I
> could go more granular here, checking the minute diffs as well for a
> more precise breakpoint, but it seems odd that the planet dump does
> not break cleanly on a midnight boundary so that it's possible to pick
> up the differences moving forward.
Planet dumps are not snapshots - they do not represent a consistent view
at any particular point in time because they take a number of hours to
generate, during which time new changes are constantly being made to the
contents of the database.
I believe that it is supposed to be safe to apply diffs which overlap
with the planet dump in order to bring it to a consistent state however.
> The cutoff times for files on planet.openstreetmap.org could behave
> more consistently. A weekly dump should end at 11:59pm so that dailies
> can immediately pick up user activity. Hourly and daily dumps should
> be synchronized. This seems more difficult.
As explained above, there is no cutoff time as such, and it isn't
possible to implement one as things stand. It may be possible once we
have working transactions, though it's not clear that a transaction that
lasts many hours would be sensible or workable.
BTW I'm not sure why you CCed the OSMF board on this... I don't think it
needs their input at all.
Tom Hughes (tom at compton.nu)
More information about the talk