[OSM-talk] Create extra Planet files for syncing
Brett Henderson
brett at bretth.com
Tue Jun 24 05:45:40 BST 2008
David Earl wrote:
> Would it really be that much slower: yes it is more work, but OTOH, it
> is fewer disk writes?
The daily diffs are approximately 6MB of data to write but take a couple
of minutes to produce. The disk overhead is negligible. But this also
means that the compression overhead is probably also negligible and
wouldn't add much to the overall time. When I first set this up we were
dealing with TIGER imports and much larger daily files, the volumes are
much smaller at the moment.
I'll set up a daily diff using the hourly/minute mechanism and see if
any problems occur.
>
> I rely on these for the Namefinder updates, and I've always been
> worried that they may not form a continuous sequence, especially if
> something goes wrong, the consequence of which is to repair it I'd
> have to do a full database import which takes a week or so to run.
Understand, my aim is for this to be a reliable means of keeping in sync.
>
> It would be a simple matter to switch to gzip, so long as I know when
> it is to change.
I'll set up the new one in parallel for a little while.
>
> I noticed that the day after the empty file, the file was larger than
> usual. Did it in fact catch the diffs since the previous file, or just
> in the previous day?
I've just had a quick look, I'm fairly sure data has been missed. The
larger file is presumably because people had queued uploads.
>
> From the Namefinder POV, if I miss a file, I catch up with it later
> (but it did break when you changed the convention to span two days a
> while back after a failure, but I fixed that). But if there is a gap
> in the sequence that's very hard to repair because I'd already have
> applied later updates.
>
> David
I have just re-generated the file from the 19th-20th.
http://planet.openstreetmap.org/daily/daily-20080619-20080620.osc.bz2
If you wish to re-import, just import the files in sequence again.
Assuming your import scripts won't break in some unexpected way, this
will ensure that all updates are applied in the correct order.
Note that osmosis now has a task called --read-change-interval which can
download all hourly or minute diffs since the last invocation, merge
them into a single changeset, and send to subsequent tasks in the
osmosis pipeline. The consuming task can be an xml change file writer
or a database writing task if you have one. It tracks the latest
downloaded timestamp in a timestamp file and will generate empty
changesets if no updates are available yet and will abort completely if
the planet server can't be reached.
More information about the talk
mailing list