[OSM-talk] Adding wikidata tags to the remaining objects with only wikipedia tag

Yuri Astrakhan yuriastrakhan at gmail.com
Mon Oct 2 06:04:03 UTC 2017


>
>
>   I will repeat that this is not something which COULD be done, this
> comparison is something, what IS ACTUALLY DONE and has been done for
> years.


Tomas, this is what I understand from what you are saying:
* You download a geotagging wikidata dump and generate a table with
latitude, longitude, and a wiki page title.
* You also generate the same table from OSM for all nodes, ways (using geo
centroid?), and relations (using ??)
* you compare article titles between the two, and when OSM has something
that Wikipedia doesn't, you search automatically by geo proximity, or you
let users fix it or ??

If I understood you correctly (and please correct my understanding if I did
not), it wouldn't work for the whole planet, simply because the average
distance between what OSM has and what Wikidata has is far too great to be
useful.  Maybe Lithuania, being a relatively small area with a very active
community has been kept up in a perfect form (and each geo point is
identical in both Wikidata & OSM, which might be a licensing issue), but
the current state of the world OSM data is that there are only 17% of nodes
are within 10 meters of their Wikidata counterpart. If we count ways and
relations, it drops to 11% -- http://tinyurl.com/ybp4tp7a

In other words, with your approach, you can detect when OSM's wikipedia tag
is no longer correct, because Wikipedia geo dump no longer has it. But
afterwards you have to go and fix it by hand.  And this is pretty much the
only operation you can do with this approach.  You cannot analyze tens of
thousands of existing wikipedia tags that are pointing to links, disambigs,
people, tree species, places of business - you can simply mark them as "geo
missing in Wikipedia".

I took a quick look at the various quality control queries I built on the
cleanup page.  Lithuania does seem pretty clean, with only one
disambiguation at the moment (has been there for 4 months) -
https://www.openstreetmap.org/node/1717783246 - but both have the same
location, two airports that point to a list -
https://www.openstreetmap.org/node/1042034645 and
https://www.openstreetmap.org/node/1042034660 . None of these issues are
possible to find with your approach, or detect renaming. For the rest of
the world, the situation is much worse.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20171002/5c0c8d72/attachment.html>


More information about the talk mailing list