[Talk-ca] duplicate address data‏

Gerd Petermann gpetermann_muenchen at hotmail.com
Sun Mar 29 05:43:37 UTC 2015




Hi Stewart,

>> I don't care much about special cases.
>
>I'd say that rural addressing is between 10-20% of addresses in Ontario.
>Far from a special case.

OK. I understand that this is a problem, I just don't care about it because
I can't solve it with my knowledge.

>
>> I wanted to point out that the OSM data base for Canada contains a
>> huge amount of
>> - useless data like duplicated addr:interpolation ways including nodes
>> from different imports
>>  which IMHO should be removed ASAP
>
>Yes, I agree that there are some errors, but we can't guarantee that the
>Canvec 10 data will be much better, or that some of the older data is
>bad just because of its version. Imports work really badly in Canada, as
>our source data isn't wonderful and we don't have enough folks on the
>ground to verify.

Let's start with the simple problem first.
I don't want to replace data, I just want to remove completely obsolete
data. I don't know what's the best way to do that.
I can code a small program which scans a download from geofrabrik
with rules like this:
1) select nodes which are referenced as first or last node 
in addr:interpolation ways
and which are not referenced by any othe way or relation,
2) of those nodes find the ones with equal (or almost equal) coordinates and
equal tags except source=*, mark them
4) select such a pair of equal nodes, lets call them n1 and n2
5) select the addr:interpolation ways that have such marked nodes,
lets call them w1 and w2. 
6) make sure that w1 and w2 have no common node
7) make sure that w1 and w2 end with another pair of marked nodes
 8) if both ways have a source tag containing "CanVec", select the one
with the older version, lets call it w_older
9) make sure that none of the nodes referenced by w_older
is referenced by an other way or relation
10) remove w_older and all it nodes

I think we will find thousends of ways. 
I have no idea how bots are working on the OSM database, but I think
this would be a task for one.
If I would write such a program, it would produce an *.osm file
containing a lot of rows like this (or whatever is needed to delete the ways and nodes)

<?xml version='1.0' encoding='UTF-8'?>
<osm version='0.6' upload='true' generator='CanVec-Cleaner'>
  <bounds minlat='45.4333348' minlon='-76.3457702' maxlat='45.4351546' maxlon='-76.3437317' origin='CGImap 0.3.3 (11726 thorn-01.openstreetmap.org)' />
  <node id='972298820' action='delete' timestamp='2010-10-31T15:13:05Z' uid='186592' user='Johnwhelan' visible='true' version='1' changeset='6240358' lat='45.4338469' lon='-76.3437594'>  </node>
  <node id='972299268' action='delete' timestamp='2010-10-31T15:13:25Z' uid='186592' user='Johnwhelan' visible='true' version='1' changeset='6240358' lat='45.4346425' lon='-76.3457425'> </node>
  <way id='83504524' action='delete' timestamp='2010-10-31T15:19:40Z' uid='186592' user='Johnwhelan' visible='true' version='1' changeset='6240358'> </way>
</osm>

>
>> - wrong data like
>>>>  *  addr:interpolation ways with nodes that refer to a different street
>
>Is there a way to make interpolation names change if the street name is
>edited/corrected? Unless this happens, I see these errors as inevitable.
I see no easy way to automate that. The problem is that you can't say for
sure that road has the right name and all addr:interpolation nodes are wrong.
I guess one could try to analyse the changesets, but I have no knowlege here.

Gerd




 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-ca/attachments/20150329/47dcf1a1/attachment-0001.html>


More information about the Talk-ca mailing list