[OSM-dev] Osmosis and Postgresql

Jeffrey Warren warren at mit.edu
Fri Jul 10 14:44:58 BST 2009


I attempted to find all orphan current_way_nodes, which have no
current_node, and are causing the constraint to fail:

SELECT * FROM current_way_nodes LEFT JOIN current_nodes ON
current_nodes.id= current_way_nodes.node_id WHERE
current_nodes.id IS NULL

This seems to hang, but perhaps I just need to let it run for a long time. I
tried formulating this query a few different ways but it seems that since it
cannot use an index on current_nodes.id, the query must necessarily be quite
slow. If you limit by id, you can get some results:

SELECT * FROM current_way_nodes LEFT JOIN current_nodes ON
current_nodes.id= current_way_nodes.node_id WHERE
current_way_nodes.id < 10000 AND current_nodes.id IS NULL

yields several current_way_nodes in the planet dump which don't have
current_node counterparts:

current_way_nodes:
id;node_id;sequence_id
650;420418211;1
650;420418214;2
650;420418213;3
650;420418212;4
2477;411393928;2
2477;411371802;4

And I traced the lifecycle of a
couple records that violated the foreign key constraint (left a
current_way_node as an orphan) and found this:


<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="OpenStreetMap server">
  <node id="17704640" lat="51.5465346" lon="-2.4116768" changeset="127793"
user="southglos" uid="3937" visible="true" timestamp="2006-10-06T11:08:17Z"
version="1">
    <tag k="created_by" v="JOSM"/>
  </node>
  <node id="17704640" lat="51.5465346" lon="-2.4116768" changeset="601639"
user="Strange but untrue" uid="57932" visible="true"
timestamp="2009-04-16T22:28:59Z" version="2"/>
  <node id="17704640" lat="51.5465346" lon="-2.4116768" changeset="601639"
user="Strange but untrue" uid="57932" visible="true"
timestamp="2009-04-16T22:31:23Z" version="3"/>
  <node id="17704640" lat="51.5465346" lon="-2.4116768" changeset="601639"
user="Strange but untrue" uid="57932" visible="true"
timestamp="2009-04-16T22:32:07Z" version="4"/>
  <node id="17704640" lat="51.5465346" lon="-2.4116768" changeset="601639"
user="Strange but untrue" uid="57932" visible="true"
timestamp="2009-04-16T22:32:19Z" version="5"/>
  <node id="17704640" lat="51.5465346" lon="-2.4116768" changeset="601639"
user="Strange but untrue" uid="57932" visible="false"
timestamp="2009-04-16T22:32:21Z" version="6"/>
</osm>

<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="OpenStreetMap server">
  <node id="17704638" lat="51.5454224" lon="-2.4112624" changeset="7133"
user="southglos" uid="3937" visible="true" timestamp="2007-04-05T16:30:26Z"
version="1">
    <tag k="created_by" v="JOSM"/>
  </node>
  <node id="17704638" lat="51.5454224" lon="-2.4112624" changeset="601639"
user="Strange but untrue" uid="57932" visible="true"
timestamp="2009-04-16T22:28:58Z" version="2"/>
  <node id="17704638" lat="51.5454224" lon="-2.4112624" changeset="601639"
user="Strange but untrue" uid="57932" visible="true"
timestamp="2009-04-16T22:31:22Z" version="3"/>
  <node id="17704638" lat="51.5454224" lon="-2.4112624" changeset="601639"
user="Strange but untrue" uid="57932" visible="true"
timestamp="2009-04-16T22:32:07Z" version="4"/>
  <node id="17704638" lat="51.5454224" lon="-2.4112624" changeset="601639"
user="Strange but untrue" uid="57932" visible="true"
timestamp="2009-04-16T22:32:18Z" version="5"/>
  <node id="17704638" lat="51.5454224" lon="-2.4112624" changeset="601639"
user="Strange but untrue" uid="57932" visible="false"
timestamp="2009-04-16T22:32:21Z" version="6"/>
</osm>

It does look like these are marked as visible=false, after which their
current_node entries are deleted (neither of these two are in the 'current'
dataset anymore). For these offending nodes, the current_way_node entries
were not also deleted. This is why i asked if anyone knows how visible=false
causes these records to be actually dropped from the current_ tables.

So that's my best guess at how this happens. Is it a problem? Not for me, I
guess, but as you said, some applications may continue to take action on
this data and future corruption could perhaps occur.

Best,
Jeff



On Wed, Jul 8, 2009 at 8:00 PM, Joachim Zobel <jz-2008 at heute-morgen.de>wrote:

> Am Dienstag, den 07.07.2009, 18:50 -0400 schrieb Jeffrey Warren:
> > No, the data is not corrupt, nor are there duplicate entries. Quite a
> > few current_way_nodes are not getting deleted when they cease to be
> > 'current', that is, once they're only
> > history and no longer part of the current planet.
>
> Ah, I see. It's not corruption, it's just invalid data hanging around :)
>
> How can this happen? current_way_nodes has foreign keys on current_ways
> and current_nodes.
>
> > I asked earlier about what mechanism actually deletes
> > current_way_nodes if marked for deletion (i think the tag
> > is 'visible=false' but i might be mistaken) but nobody seemed to know
>
> Are you shure that visible=false means "marked for deletion".
> current_way_nodes do not have any such flag, only current_ways and
> current_nodes have that.
>
> >  - it seems the best route is to go through and delete those orphans,
> > but I couldn't  write a single SQL query to do so efficiently and
> > before writing a script to do it, I realized that the rails port
> > doesn't really care if they're there, or if the constraint is
> > created.
>
> Applikations do in general work without constraints. They just tend to
> corrupt data. Once the data is corrupt, applications start to behave
> weird.
>
> Please note that I do not understand the ways of the API, I just started
> to answer because you seemed to have a database problem (I do understand
> databases). The above answer results from just applying some "database
> common sense".
>
> Maybe someone who does understand the API can answer this.
>
> If you need an SQL statement, just state the query in english, that
> should be easily doable.
>
> Sincerely,
> Joachim
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20090710/bb55e123/attachment.html>


More information about the dev mailing list