[OSM-dev] Referential Integrity Report

Brett Henderson brett at bretth.com
Tue Oct 16 12:43:16 BST 2007


Hi All,

I've created a new osmosis task for reporting on referential integrity 
errors.  It was originally created in response to an issue Frederik Ramm 
was having when importing an inconsistent planet file into a database to 
allow him to fix mysql sequence ids to prevent issues when subsequently 
creating data via the api.  It takes about 15 minutes to run on a planet 
modified to use "standard" utc dates.

Usage is:
osmosis --read-xml file=planet.osm --report-integrity file=report.txt

I've run it on the 071010 planet file with some interesting results.  
The report is available from:
http://www.bretth.com/osmosis/integrity-report-071010.txt.gz

A large number of issues primarily at the end of the report are due to 
the inconsistent snapshot issue of the planet generation process.  This 
is due to nodes being read sometime before ways leading to a number of 
ways referring to non-existent nodes.  This is a known quirk with the 
existing process.  For this reason, any report entries specifying issues 
with nodes greater than approximately 62177000 can be ignored for now.

What surprised me most was the last number of ways that refer to nodes 
with an id of 0.  I assume this is a known issue.  An example is 1881900 
which was updated at 2007-08-05T12:42:58+01:00 and refers to two nodes 
both with an id of 0.  It's probably nothing to be alarmed about, but a 
bit dodgy nonetheless.

The second most common issue appears to be ways referring to nodes that 
have been deleted.  An example is way 4018772 referring to node 
60083908.  The node was updated at "2007-09-29T10:01:07+01:00" and the 
way was updated at "2007-09-29T09:58:36+01:00".  Given that the node was 
deleted 3 minutes *after* the way I wonder how this occurred.  I'm 
wondering if it could be due the api performing a validity check, then 
waiting a large amount of time to obtain a database lock by which time 
the node has been deleted although 3 minutes seems like a long time.  If 
only we had true database referential integrity :-)

I suspect most of the above isn't very serious, but I thought others 
might find it interesting.

Cheers,
Brett





More information about the dev mailing list