[OSM-dev] Updating Planet and Reliability

Fri Oct 21 12:49:13 BST 2011

Hi Andrew,

On Wed, Oct 19, 2011 at 11:54 PM, Andrew Ayre <andy at britishideas.com> wrote:

> On 7/21/2011 9:19 AM, Andy Allan wrote:
> > On Thu, Jul 21, 2011 at 9:07 AM, Andrew Ayre <andy at britishideas.com>
> wrote:
> >> Keeping my copy of the planet up to date is a two-step process with
> >> Osmosis. Get the latest changes and applying them. This takes about an
> >> hour on my server which is enough time for some other user to reboot the
> >> server without realizing/knowing.
>
>>
> >> What protection is there in Osmosis to recover from this without missing
> >> any changes? If none, how are people solving this in their scripts?
>

Osmosis only updates the local state file after downstream pipeline tasks
have completed.  This means that if you're writing a PBF or XML file, that
file will be complete before the local state file gets updated by the --rri
task.  If any problems occur prior to that point, the next time you run
Osmosis it will simply start again from the previous state file potentially
doing the same changes again.  It aims to be Idempotent.

> >
> > It depends if you run things as two separate stages or not. I try to
> > combine anything involving the updates into one command, so that if
> > it's interrupted then the state file hasn't changed and can simply be
> > run again.
> >
> > So that means doing something along the lines of --rri --simc --rx
> > --ac --wx to fetch the changes, simplify, read the planet, apply the
> > changes and write it out, all in one go. If you split this into
> > multiple osmosis calls, then you'll need to approach it differently -
> > perhaps testing whether or not the local changes.osm.gz exists, and if
> > it does, skip downloading and go straight to applying it to the
> > planet.
>
> Hello, I am still trying to get this to work properly. Some of the time
> I end up with a state.txt that looks like this:
>
> #Sat Oct 15 00:24:20 CEST 2011
> sequenceNumber=16402
> timestamp=2011-10-04T08\:00\:00Z
>
> The timestamp in the comment is correct. The timestamp on the last line
> is not - it's stale.
>

The timestamp in the comment specifies when the state file was created, it
is informational purposes only and not used anywhere.  The timestamp field
specifies what point in time the state file represents.

In other words, you downloaded this file at Sat Oct 15 00:24:20 CEST 2011:
http://planet.openstreetmap.org/hour-replicate/000/016/402.state.txt

After that processing run, your planet file only contained data up to
October 4th.

>
> On the next attempt to update the planet it fails and I end up with a
> planet size of zero bytes. I don't know the error message because I
> can't get it to always fail. It seems more likely to do this from a
> cronjob I think...
>

Osmosis could be bombing out for any number of reasons.  Do you have logs?
It may simply have not completed processing when it was interrupted.  It
downloads all files before sending data through the pipeline.  It is also
possible that Osmosis is bombing out because you have a large time window to
process and it is trying to download a huge number of change files and apply
them.  It might be running out of file handles (I think the number of
threads is constant but I'm vague on the details now).  The output planet
file will be empty if it fails before writing any data to the destination.
You shouldn't lose data though, because the failure should occur prior to
the local state.txt being updated.

What value do you have for maxInterval and baseUrl in the --rri
configuration file?  I'm assuming you're using baseUrl=
http://planet.openstreetmap.org/hour-replicate based on your example
sequence number.  maxInterval shouldn't be set to 0, but given an upper
bound.  Some experimentation may be required to find the upper limit for
your system.

For updating planet files you may be better off using the
--read-change-interval task instead of --read-replication-interval, simply
because you'll then be able to download daily change files.  I personally
use the following settings in the configuration.txt file:
baseUrl=http://planet.openstreetmap.org/history
intervalLength=86400
changeFileBeginFormat=yyyy/MMdd
changeFileEndFormat=MMdd
maxDownloadCount = 100

>
> I'm using osmosis 0.39.
>
> /tools/osmosis --rri workingDirectory=/data/ --simc --read-bin
> /data/planet-current.osm.pbf --buffer bufferCapacity=12000
> --apply-change --buffer bufferCapacity=12000 --write-bin
> /data/planet-111019-135247.osm.pbf
>
> Any ideas what I am doing wrong?
>

Hopefully something above will help.

Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20111021/4b85eb7c/attachment.html>