[OSM-dev] Daily Planet.osm

Mon Apr 30 10:41:35 BST 2007

> -----Original Message-----
> From: dev-bounces at openstreetmap.org
> [mailto:dev-bounces at openstreetmap.org]On Behalf Of Thomas Walraet
> Sent: 30 April 2007 08:44
> To: Dev Openstreetmap
> Subject: Re: [OSM-dev] Daily Planet.osm
>
>
> Sebastian Spaeth a écrit :
> >
> > Is there any progress (or interest) on having e.g. full monthly
> > snapshots and incremental weekly ones? this could probably save us quite
> > some space.
>
> Incremental dumps already have been discussed. I think it is a good
> solution ...

I wonder whether people may be interested in the approach I've taken with
another project, which might be instructive here. Not that there is anything
magic or innovative about it, but just that there are parallels.

I am running a MySQL server through a PHP interface. I do a full backup(*)
once a week or so (my data set is about half the size of OSM's now). My
backups are genuinely for data integrity, but they have much the same status
as the planet dump. I do them as SQL INSERTS, but that's for convenience -
they used to be CSVs, and could equally well be XML.

However, I can't afford a week's data loss in the event of disaster. A team
of people is inputting data, and a week of their wasted time would be
expensive. So I do incremental backups too. These consist of any modifying
SQL generated (INSERT, DELETE, UPDATE in practice), but again could equally
well be XML. (All the SQL requests go through a common interface, so it is
easy to generate these as a side effect. I used to do this at a higher
semantic level but I got worried that as I made code changes I would forget
to put changes in the backup or not do it right, leading to possible failed
recovery on disaster; partly prompted by a near disaster).

Every two hours a cron job on another server locks the file for read (The
file I write the change log into is exclusively write locked by modifying
transactions, so it would wait briefly if in use), renames it (causing the
main server to start a fresh file on the next transaction) and copies it off
the main server (with various safeguards to check its integrity). So by
uploading the full backup and replaying the incrementals in sequence I am
always less than two hours out of date in the event of disaster, and the
bandwidth for the incremental files is easily manageable.

The key point is that this is a side effect of the transaction with the
database, and in principle I could be replicating the database at (in my
case) a two hour lag time. (The side effect could also in principle be
replicating the database elsewhere incidentally). If this were a set of
'delete', 'update', 'insert's in an XML file it would look very like an
incremental change record for the OSM planet file.
  <node id='...' do='update' lon='...' lat='...' ts='...'>
    <tag .../>
  </node>
  <segment id='...' do='delete' .../>
  ...

This is probably all very obvious and probably repeats what you've already
talked about. If so, sorry.

David

(*) PS mention of backups makes me ask - what provision is there for
disaster recovery in OSM? I know the database has much more data in it than
the planet file contains, like audit trails and history.