[OSM-talk] Adding Wikidata tags to 70k items automatically

SomeoneElse lists at mail.atownsend.org.uk
Thu Aug 28 11:04:43 UTC 2014


On 27/08/2014 22:15, Andy Mabbett wrote:
> What, again? ;-)

You've been "beating the drum" for wikidata for a while, but that's 
mostly been on the GB list or even more locally.  I definitely think 
that it's worth explaining the benefits on talk at .

>
> For example:
>
> Wikidata has data on each of these entiti which eitherisnt in OSM
> (who's the ayor of this town/ vicar of this church?)

OK - not sure how that's a benefit to OSM as such, though I'm sure 
people could do "useful unexpected things" with those links.

> or which acts as
> a csanity check for what is in OSM (We can generate lists where the
> two disagree, for humans to check and fix).

That sounds useful, but sounds like "in theory someone could generate a 
list" rather than actually volunteering to do so.

> Wikidata has multi-lingual labels for many objects, which OSM
> renderers can fetch via the Wikidata link.

That's definitely useful.  It would allow us to split the "verifiable on 
the ground" stuff from the other stuff - it should save us having 190 
names for Berlin that mostly say "Berlin".

Another one (mentioned on IRC) is a way to get up to date population 
data for places - data that couldn't or shouldn't be in OSM for licence 
reasons, or (like your "vicars" example) is continuously changing and 
not easily verifiable.


> What disadvantages do you forsee?

Maintainability, as has already been mentioned.  With any import there 
has to be a plan for "how do we make sure this data stays up to date", 
and I'm not seeing that yet.

Another issue is with "dodgy data" on either the OSM or the wikidata 
side.  I've already mentioned "non-existing villages" in wikipedia, but 
there are also examples where the OSM side's iffy too, which could 
result in a false match.

(assuming that it's considered a good idea to add the tags at all) 71k 
worldwide matches doesn't sound like _that_ many to check manually - 
it'd be useful to know how many of those matches there were per (US) 
state, (UK) county or (DE) Land, or similar.

> I think the issue raised have been addressed; which do you feel have not been?

Specifally, comments such as "In my opinion, the risks of doing this 
automatically are just too high", "+1 to not import blindly but require 
human confirmation" and "that's why I was asking how you proposed to 
measure it" in those threads.

Cheers,

Andy






More information about the talk mailing list