[OSM-dev] Referential Integrity Report
Brett Henderson
brett at bretth.com
Tue Oct 16 12:43:16 BST 2007
Hi All,
I've created a new osmosis task for reporting on referential integrity
errors. It was originally created in response to an issue Frederik Ramm
was having when importing an inconsistent planet file into a database to
allow him to fix mysql sequence ids to prevent issues when subsequently
creating data via the api. It takes about 15 minutes to run on a planet
modified to use "standard" utc dates.
Usage is:
osmosis --read-xml file=planet.osm --report-integrity file=report.txt
I've run it on the 071010 planet file with some interesting results.
The report is available from:
http://www.bretth.com/osmosis/integrity-report-071010.txt.gz
A large number of issues primarily at the end of the report are due to
the inconsistent snapshot issue of the planet generation process. This
is due to nodes being read sometime before ways leading to a number of
ways referring to non-existent nodes. This is a known quirk with the
existing process. For this reason, any report entries specifying issues
with nodes greater than approximately 62177000 can be ignored for now.
What surprised me most was the last number of ways that refer to nodes
with an id of 0. I assume this is a known issue. An example is 1881900
which was updated at 2007-08-05T12:42:58+01:00 and refers to two nodes
both with an id of 0. It's probably nothing to be alarmed about, but a
bit dodgy nonetheless.
The second most common issue appears to be ways referring to nodes that
have been deleted. An example is way 4018772 referring to node
60083908. The node was updated at "2007-09-29T10:01:07+01:00" and the
way was updated at "2007-09-29T09:58:36+01:00". Given that the node was
deleted 3 minutes *after* the way I wonder how this occurred. I'm
wondering if it could be due the api performing a validity check, then
waiting a large amount of time to obtain a database lock by which time
the node has been deleted although 3 minutes seems like a long time. If
only we had true database referential integrity :-)
I suspect most of the above isn't very serious, but I thought others
might find it interesting.
Cheers,
Brett
More information about the dev
mailing list