[OSM-talk] Create extra Planet files for syncing

Tue Jun 24 05:45:40 BST 2008

David Earl wrote:
> Would it really be that much slower: yes it is more work, but OTOH, it 
> is fewer disk writes?
The daily diffs are approximately 6MB of data to write but take a couple 
of minutes to produce.  The disk overhead is negligible.  But this also 
means that the compression overhead is probably also negligible and 
wouldn't add much to the overall time.  When I first set this up we were 
dealing with TIGER imports and much larger daily files, the volumes are 
much smaller at the moment.

I'll set up a daily diff using the hourly/minute mechanism and see if 
any problems occur.
>
> I rely on these for the Namefinder updates, and I've always been 
> worried that they may not form a continuous sequence, especially if 
> something goes wrong, the consequence of which is to repair it I'd 
> have to do a full database import which takes a week or so to run.
Understand, my aim is for this to be a reliable means of keeping in sync.
>
> It would be a simple matter to switch to gzip, so long as I know when 
> it is to change.
I'll set up the new one in parallel for a little while.
>
> I noticed that the day after the empty file, the file was larger than 
> usual. Did it in fact catch the diffs since the previous file, or just 
> in the previous day?
I've just had a quick look, I'm fairly sure data has been missed.  The 
larger file is presumably because people had queued uploads.
>
> From the Namefinder POV, if I miss a file, I catch up with it later 
> (but it did break when you changed the convention to span two days a 
> while back after a failure, but I fixed that). But if there is a gap 
> in the sequence that's very hard to repair because I'd already have 
> applied later updates.
>
> David
I have just re-generated the file from the 19th-20th.
http://planet.openstreetmap.org/daily/daily-20080619-20080620.osc.bz2

If you wish to re-import, just import the files in sequence again.  
Assuming your import scripts won't break in some unexpected way, this 
will ensure that all updates are applied in the correct order.

Note that osmosis now has a task called --read-change-interval which can 
download all hourly or minute diffs since the last invocation, merge 
them into a single changeset, and send to subsequent tasks in the 
osmosis pipeline.  The consuming task can be an xml change file writer 
or a database writing task if you have one.  It tracks the latest 
downloaded timestamp in a timestamp file and will generate empty 
changesets if no updates are available yet and will abort completely if 
the planet server can't be reached.