[OSM-dev] Wikipedia Matching
ajt1047 at gmail.com
Fri May 19 11:23:21 UTC 2017
On 19/05/2017 10:11, Christoph Lingg wrote:
> TagInfo states around 1 Million wikipedia/wikidata tags which is a great start.
(not directly related to the question, but relevant to the accuracy of
the data links)
I'd certainly take some of those added tags with a pinch of salt. A
number "place" objects near me have been linked to wikidata items by a
well-meaning wikipedian, but unfortunately they don't actually match.
What tends to happen is something like:
o OSM has a place object for a village and an admin entity
o An OSM user adds a wikipedia tag to the admin entity. The wikipedia
entry describes itself as covering both the village and the admin
entity, so that's OK.
o A wikipedian writes a bot that creates a wikidata item from the
wikipedia article. The bot creates wikidata entries for villages, not
admin entities. That's not entirely wrong, because the wikipedia
article actually covers both.
o A different wikipedian spots that there is an OSM admin entity and a
wikidata item with the same name in a similar location and links them
via a wikidata tag. This results in the wrong OSM entity being linked
to a wikidata item.
If you're consuming this data downstream you may want to add some
processing that drops "dubious" links. How you calculate "dubious" is
difficult, but you may be able to look at the OSM account that added the
wikidata link and exclude those links added by a user who has added
links worldwide (i.e. who clearly doesn't have local knowledge), or by a
user who has added links unfeasibly quickly (no manual checking
possible) or whose changeset comments have received a lot of discussion
saying "that link you added is wrong". That last bit is most difficult
because many changeset discussion comments are positive (e.g. "thanks
for adding that wikidata link").
More information about the dev