[OSM-dev] Datacorruption Part IV; where way members are not in the planet
Brett Henderson
brett at bretth.com
Fri Nov 28 00:56:06 GMT 2008
Frederik Ramm wrote:
> Hi,
>
> Stefan de Konink wrote:
>
>> I think it would be very good if the dump process was done in the
>> following sequence; first relations, then ways then nodes. In that case
>> all dependencies are resolved.
>>
>
> No, because just after you have dumped the ways and before you have
> dumped the nodes, someone might delete a way and all its nodes; you have
> already dumped the way but will not dump the nodes -> inconsistency.
>
> The only way to do this right is to work from the history tables like
> the diffs do, but that currently is too expensive in terms of database load.
>
> The problem is not that big because, as has been pointed out already,
> you can always "upgrade" a planet file to be consistent by applying the
> matching daily diff.
>
>
Stefan, have you tried osmosis and the daily diff files? They'll solve
all of these timing related inconsistencies. You basically take a
planet, then apply diffs overlapping with the period where the planet
was produced.
The planet creation process has a number of warts. It was created in
the days where the database was orders of magnitude smaller than it is
now. In my opinion it is reaching the limits of its usefulness. But I
don't want to see somebody create a better planet dump application that
produces a consistent snapshot because the whole idea is flawed for a
dataset the size of ours. The whole point of me creating osmosis
changesets was to eliminate the need for a planet creation process.
This has two benefits, it reduces load on the main database, and allows
a far shorter replication interval than weekly. Other issues such as
long data import times become less relevant because you only have to do
them once. In time I'd like to see the main database only used for
online editing, and changeset creation. Everything else can be done
downstream with a change feed. As a side note, osmosis already has a
consistent snapshot process that can use history tables instead of
current tables but for both performance reasons and data issues this is
not viable at the moment.
When 0.6 comes out and full version info is available I intend to
generate a complete set of history files for OSM. Using that it will be
possible to generate a complete replica of production including
history. If that works out and no wrinkles appear it should be possible
to change the planet creation process to work 100% from changesets.
As for the other inconsistencies, the effort you're expending to clean
up the db is possibly not wasted because I assume it will need to be
done as part of the 0.5 to 0.6 migration process anyway. However if
you're doing it to get a clean import for your own purposes I think your
efforts are misguided. As Frederik has pointed out you'll never get a
100% clean snapshot. Like everybody else has managed to do you'll just
have to suck it up until 0.6 is ready. To be honest, if we fix all of
these issues in 0.5 it will reduce the urgency to get 0.6 out the door
and distract efforts to do so. OSM has survived for the last few years
with a non-perfect solution, another few months isn't going to hurt
much. I've also been horrified on a number of occasions when I realise
how things work, but have come to realise that a solution that works
most of the time is far preferable to something which is perfect but not
released yet.
I'd like to see you spend a little more time thinking before you make
statements saying how things should be done. It's easy to make
statements about what is wrong and how things could be better. It is
harder to come up with working solutions that fix these problems. It is
far far more difficult to build a strong working relationship with large
groups of people necessary to bring a complete project eco-system into
being. So far you have rubbed a lot of people the wrong way. Most of
what you are saying makes technical sense, but your approach to getting
people onboard with your ideas is not working to put it mildly.
I'm getting way off topic ...
Brett
More information about the dev
mailing list