[OSM-talk] Create extra Planet files for syncing

Mon Jun 23 12:09:17 BST 2008

On 23/06/2008 12:00, Brett Henderson wrote:
> Frederik Ramm wrote:
>> Sure, I can do that. I was assuming they would somehow be incomplete 
>> as well but if they are ok then I can just switch to using these.
> The hourly & minute are definitely more reliable than daily.  I won't be 
> surprised to see missing data in daily files.  If you see *any* problems 
> with the hourly or minute files I'd like to know about it.
> 
> The daily diffs are created using a shell script that doesn't fail 
> gracefully and doesn't have the ability to generate multiple files if 
> the db has been down for a long period.  The hourly and minute diffs are 
> created using the osmosis-mysql-extract application (included in the 
> osmosis distribution) which aborts if the db is down (spamming me with 
> cron failures every minute ...) and catches up again when the db comes 
> up again generating as many files as required to become current again.  
> The minute diffs encounter intermittent failures on a regular basis due 
> to the db or connectivity being lost for brief periods but always 
> recover again.
> 
> It would take literally 5 minutes to setup daily diffs with the new 
> mechanism.  The gzip format is something people would presumably get 
> used to fairly quickly.  But the downside is that the compressed files 
> are created directly whereas the daily script extracts to an 
> uncompressed file to minimise db locking time then compresses 
> separately.  If we eventually switch to all InnoDB tables where locking 
> isn't such an issue I'll definitely cut it over.

Would it really be that much slower: yes it is more work, but OTOH, it 
is fewer disk writes?

I rely on these for the Namefinder updates, and I've always been worried 
that they may not form a continuous sequence, especially if something 
goes wrong, the consequence of which is to repair it I'd have to do a 
full database import which takes a week or so to run.

It would be a simple matter to switch to gzip, so long as I know when it 
is to change.

I noticed that the day after the empty file, the file was larger than 
usual. Did it in fact catch the diffs since the previous file, or just 
in the previous day?

 From the Namefinder POV, if I miss a file, I catch up with it later 
(but it did break when you changed the convention to span two days a 
while back after a failure, but I fixed that). But if there is a gap in 
the sequence that's very hard to repair because I'd already have applied 
later updates.

David