[OSM-talk] import de-duplication

Thu Jul 5 11:03:24 BST 2007

Just had a few thoughts about systems to merge in large datasets (AND, Tiger etc).

Where there is existing data in the vicinity, we need to search both datasets 
for attributes that match, and where there is a match, do we update OSM, or 
leave it alone?

For instance, if AND says the equivalent of a segment from X1,Y1 to X2,Y2 is 
highway=motorway;oneway=true;ref=A17.. And OSM says there's a segment from 
X1+0.000023,Y1-0.00064 to X2-0.00037,Y2+0.0000567 is highway=motorway.. Do we 
expect it's actually the same road? If so, do we just update the tags on the 
OSM data set.

The other option would be to leave a de-militarized zone for a few metres 
surrounding the existing features in the OSM data set, and for the AND data 
not to be automatically imported but the nearest features from the AND dataset 
tagged 'here be dragons'; then for humans to go in and join things up on the 
borders.

--
Simon Hewison