[OSM-dev] Datacorruption Part IV; where way members are not in the planet

Brett Henderson brett at bretth.com
Fri Nov 28 00:56:06 GMT 2008


Frederik Ramm wrote:
> Hi,
>
> Stefan de Konink wrote:
>   
>> I think it would be very good if the dump process was done in the 
>> following sequence; first relations, then ways then nodes. In that case 
>> all dependencies are resolved. 
>>     
>
> No, because just after you have dumped the ways and before you have 
> dumped the nodes, someone might delete a way and all its nodes; you have 
> already dumped the way but will not dump the nodes -> inconsistency.
>
> The only way to do this right is to work from the history tables like 
> the diffs do, but that currently is too expensive in terms of database load.
>
> The problem is not that big because, as has been pointed out already, 
> you can always "upgrade" a planet file to be consistent by applying the 
> matching daily diff.
>
>   
Stefan, have you tried osmosis and the daily diff files?  They'll solve 
all of these timing related inconsistencies.  You basically take a 
planet, then apply diffs overlapping with the period where the planet 
was produced.

The planet creation process has a number of warts.  It was created in 
the days where the database was orders of magnitude smaller than it is 
now.  In my opinion it is reaching the limits of its usefulness.  But I 
don't want to see somebody create a better planet dump application that 
produces a consistent snapshot because the whole idea is flawed for a 
dataset the size of ours.  The whole point of me creating osmosis 
changesets was to eliminate the need for a planet creation process.  
This has two benefits, it reduces load on the main database, and allows 
a far shorter replication interval than weekly.  Other issues such as 
long data import times become less relevant because you only have to do 
them once.  In time I'd like to see the main database only used for 
online editing, and changeset creation.  Everything else can be done 
downstream with a change feed.  As a side note, osmosis already has a 
consistent snapshot process that can use history tables instead of 
current tables but for both performance reasons and data issues this is 
not viable at the moment.

When 0.6 comes out and full version info is available I intend to 
generate a complete set of history files for OSM.  Using that it will be 
possible to generate a complete replica of production including 
history.  If that works out and no wrinkles appear it should be possible 
to change the planet creation process to work 100% from changesets.

As for the other inconsistencies, the effort you're expending to clean 
up the db is possibly not wasted because I assume it will need to be 
done as part of the 0.5 to 0.6 migration process anyway.  However if 
you're doing it to get a clean import for your own purposes I think your 
efforts are misguided.  As Frederik has pointed out you'll never get a 
100% clean snapshot.  Like everybody else has managed to do you'll just 
have to suck it up until 0.6 is ready.  To be honest, if we fix all of 
these issues in 0.5 it will reduce the urgency to get 0.6 out the door 
and distract efforts to do so.  OSM has survived for the last few years 
with a non-perfect solution, another few months isn't going to hurt 
much.  I've also been horrified on a number of occasions when I realise 
how things work, but have come to realise that a solution that works 
most of the time is far preferable to something which is perfect but not 
released yet.

I'd like to see you spend a little more time thinking before you make 
statements saying how things should be done.  It's easy to make 
statements about what is wrong and how things could be better.  It is 
harder to come up with working solutions that fix these problems.  It is 
far far more difficult to build a strong working relationship with large 
groups of people necessary to bring a complete project eco-system into 
being.  So far you have rubbed a lot of people the wrong way.  Most of 
what you are saying makes technical sense, but your approach to getting 
people onboard with your ideas is not working to put it mildly.

I'm getting way off topic ...

Brett





More information about the dev mailing list