[Imports] [OSM-talk] Keeping imported data updated with source changes

Sat Jan 10 23:16:53 UTC 2015

Hi Wiktor,

If you want to automated this, you will need a database outside of
OSM, that stores the matches between the municipiality and OSM
addresses. the municipal differences over time, the OSM differences
over time. The same underlying matching algorithm previously described
is used to create the osm to municipal matches, differences in the
municipal data over time, and differences in OSM over time. Good
automation opportunities exist when you see previously matching OSM
and municipal data, diverging on the municipal side. Conflicting data
(divergent changes made to both OSM and municipal data) could be
ignored, or feed into a some kind of manual pipline like the task
manager, MapRoulette, or a QA tool. If you can get a copy of of the
older address data that was actually imported, you should be able to
mostly automate catching OSM up. I don't really see any way of
automating divergent changes, since it will be impossible for the
software to know which side it "better".  This is all normal
diff/merging type concepts, except rather than text files, the fuzzy
matching algorithm is generating the diffs.

Thanks
Jason

On Sat, Jan 10, 2015 at 1:35 PM, Wiktor Niesiobedzki <osm at vink.pl> wrote:
> 2015-01-10 16:44 GMT+01:00 Jason Remillard <remillard.jason at gmail.com>:
>>  Hi Wiktor,
>>
>> I don't think an address tag is needed or desirable.
>>
>> The best way of doing this is to compare versions of the official data
>> (perhaps every 6 months), making a list of things that have changed so
>> that they can be examined in OSM.
>
> To compare only changes in the source, I need to know, what was
> imported to OSM first. And without any reference in OSM, how do I
> guess the baseline? I could just check what's new for last 6 months
> (for example - previous half of calendar year), but then - we still
> need some tooling to verify, if someone actually did it during this
> time and present backlog for specific areas. We have very uneven
> distribution of mappers between geographic areas.
>
> And this way, I may fail to identify nodes, that were deleted in the
> source (not all sources report deleted nodes).
>
>>
>> Of coarse the big issue is that the matching is not trivial. First
>> devise a matching score combining of distance to address, and edit
>> distance in the address name and number. These scores are the weights.
>> Then use one of the weighted bipartite graph matching algorithm
>> (augmented path) that works well on sparse data. If you keep the
>> search radius down, the graph will be very sparse, so should be
>> manageable. Using the match, you can get a list of nodes that have
>> been moved, deleted, and edited in the official data set.
>
> But how should I handle such real scenario:
> - address is created in municipality
> - mapper adds it on a map
> - the script runs, sees a new address, finds a nearby address, but if
> there are some difference (different street or something like that) -
> should it update, or skip it? According to the rules so far, when I
> have a change in source, I should update the OSM, but this might not
> be the case here.
>
> And - from algoritmic point of view it looks exactly the same as scenario:
> - address is created in municipality
> - address is imported to OSM
> - street change in address by municipiality
> - the script runs
>
>
> Cheers,
>
> Wiktor
>
> _______________________________________________
> talk mailing list
> talk at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/talk