[OSM-dev] Wikipedia Matching

Andy Townsend ajt1047 at gmail.com
Fri May 19 11:23:21 UTC 2017


On 19/05/2017 10:11, Christoph Lingg wrote:
>   TagInfo states around 1 Million wikipedia/wikidata tags which is a great start.
>

(not directly related to the question, but relevant to the accuracy of 
the data links)

I'd certainly take some of those added tags with a pinch of salt.  A 
number "place" objects near me have been linked to wikidata items by a 
well-meaning wikipedian, but unfortunately they don't actually match.  
What tends to happen is something like:

o OSM has a place object for a village and an admin entity

o An OSM user adds a wikipedia tag to the admin entity.  The wikipedia 
entry describes itself as covering both the village and the admin 
entity, so that's OK.

o A wikipedian writes a bot that creates a wikidata item from the 
wikipedia article.  The bot creates wikidata entries for villages, not 
admin entities.  That's not entirely wrong, because the wikipedia 
article actually covers both.

o A different wikipedian spots that there is an OSM admin entity and a 
wikidata item with the same name in a similar location and links them 
via a wikidata tag.  This results in the wrong OSM entity being linked 
to a wikidata item.

If you're consuming this data downstream you may want to add some 
processing that drops "dubious" links.  How you calculate "dubious" is 
difficult, but you may be able to look at the OSM account that added the 
wikidata link and exclude those links added by a user who has added 
links worldwide (i.e. who clearly doesn't have local knowledge), or by a 
user who has added links unfeasibly quickly (no manual checking 
possible) or whose changeset comments have received a lot of discussion 
saying "that link you added is wrong".  That last bit is most difficult 
because many changeset discussion comments are positive (e.g. "thanks 
for adding that wikidata link").

Best Regards,

Andy




More information about the dev mailing list