[OSM-talk] Could we just pause any wikidata edits for a month or two?

Minh Nguyen minh at nguyen.cincinnati.oh.us
Wed Oct 11 09:45:51 UTC 2017


Great questions. I've attempted to answer a few of them below:

On 03/10/2017 09:56, Christoph Hormann wrote:
> * To what extent has there been information transferred systematically
> from Wikidata and Wikipedia to OSM based on wikidata ID references
> (like adding names in different languages).  As others have explained
> this would be legally problematic and it would be important to know how
> common this is.

I agree that there are questions about OSM's acceptance of labels and 
statements copied from Wikidata, though I would've expected this 
phenomenon to be at least as common with Wikipedia long before the 
introduction of the wikidata tag.

Years ago, there was a campaign to add as many translations of country 
names as possible, using Wikipedia as the primary source. [1] A map 
renderer that uses these translations would logically want translations 
or transliterations for as many cities as possible, but my impression is 
that the OSM community would frown on such a massive expansion in city 
name transliterations. Instead, we can point data consumers to Wikidata 
as a source for this data.

> * How stable is the identity of what can be found under a certain
> Wikidata ID.  As mentioned there are cases where Wikidata aggregates
> several concepts under one ID (like an administrative unit and a
> populated place in case of cities/towns).  Would it be possible that
> this changes?  If yes, would the original ID be re-purposed or would it
> cease to exist?

To the extent that an administrative unit and populated place are 
considered separate entities, as they are for some kinds of places, 
Wikidata ideally maintains separate entities for each. The reality is 
less clear-cut, since much of Wikidata's original data on geographic and 
political entities comes from Wikipedia, which generally doesn't make 
such distinctions at the article title level. The Wikidata project aims 
to eventually create separate entities for every concept that Wikipedia 
has traditionally conflated inside the same article. Thus Wikidata 
maintains a separate entity for each Pokémon species, whereas the 
English Wikipedia combines them all into a few list articles. [2][3]

If an administrative unit or populated place (or both) ceases to exist, 
the QID remains valid, but a statement or qualifier is added to indicate 
"former" status, much like OSM's lifecycle tags (disused etc.). An 
entity may be redirected under some circumstances. For example, if the 
Wikidata community discovers that two entities are duplicates, referring 
to exactly the same concept, an editor will manually blank one in favor 
of the other, and a bot will create a redirect automatically. [4]

Many of the duplicate entities were created as a result of incorrect 
linking between Wikipedia article translations at the time Wikipedia 
article titles were being imported into Wikidata. If someone had 
translated the article "Pumpkin" from English to Pennsylvania German but 
neglected to link the English article to the Pennsylvania German one, 
Wikidata might've wound up with two entities, one linking to many 
languages including English, the other linking to only Pennsylvania 
German. Most likely the latter entity would end up redirecting to the 
former.

The English Wikipedia sees a couple dozen geographical articles renamed 
each day. [5] This is a rough estimate based on articles tagged with 
geographical coordinates. I don't know how many of these articles are 
the target of wikipedia tags in OSM -- I think that would require Yuri's 
SPARQL tool.

But the important thing to note is that a redirect on Wikipedia may not 
remain a redirect for long: editors may decide to repurpose the redirect 
page for a disambiguation page or perhaps an article on a subtlely 
different topic. If that happens, an OSM data consumer would have to 
trawl through article history to determine which article each wikipedia 
tag really meant to refer to. By comparison, since integers are cheap, 
Wikidata entities don't tend to get repurposed the way Wikipedia article 
titles do, so even a stale QID can be traced to relevant data pretty easily.

> * What is the qualification of Wikidata for having its IDs in OSM (both
> for wikidata=* and X:wikidata=*)?  Is there a particular objective
> criterion that qualifies it?  Would there be other external IDs that
> would also qualify under these criteria?  Is there a limit in the
> number of different external IDs OSM is going to accept?

There are at least several other kinds of IDs that have been added in 
large numbers in the past. Off the top of my head, there are the various 
ref schemes used in conjunction with the heritage tag, GNIS feature IDs 
associated with an import of POIs in the U.S., and of course regulatory 
IDs such as ICAO/IATA.

Far from opening the floodgates to external IDs, Wikidata gives us the 
ability to limit external ID tagging. Consider that Wikidata lists seven 
different external identifiers for Hamilton County, Ohio, United States. 
[6] If someone ever proposes to tag U.S. counties with FIPS or GeoNames 
codes, we can point out that the feature is tagged with a Wikidata QID 
and the Wikidata entity is tagged with FIPS and GeoNames codes, making 
additional OSM tagging unnecessary. So we can consider Wikidata to be a 
meta external database, yet we still have the flexibility to bring in 
other external IDs if that's what the community decides to do.

> Also i think it would be of great importance for OSM and a functioning
> communication in the community to have better documentation of:
> 
> * systematic wikidata ID addition/editing efforts (there seems to be
> nothing listed currently on
> https://wiki.openstreetmap.org/wiki/Category:Automated_edits_log)
> * tag documentation of the wikidata tags.  This needs a lot of
> improvement.  Like:
> 
> https://wiki.openstreetmap.org/wiki/Key:wikidata does not make clear if
> these document 1:1 relationships between OSM features and wikidata
> objects or not and what qualifies a wikidata ID to be 'about a
> feature'.  How does a mapper practically verify if a certain wikidata
> ID is correct on a certain feature?

I agree, finding the most effective way to explain these relationships 
in documentation will be an ongoing effort for some time. One problem I 
commonly see in existing wikidata=* mapping is that, for example, all 
the locations of a restaurant chain are given the same wikidata tag. 
wikidata=* is designed to be a 1:1 relationship, at least for POIs and 
routes. (I suppose a company may have more than one headquarters, 
though.) Tags like brand:wikidata=* were derived to promote the wikidata 
tag's 1:1 relationship.

As for the practicality of verifying wikidata tags, I think it's 
important for editors to fetch and display the label beside the QID 
whenever it's displayed. Perhaps also the description or "is a" statement.

> https://wiki.openstreetmap.org/wiki/Key:brand:wikidata is plain wrong,
> brand:wikidata=* is not a machine-readable form of brand=*.  It in
> particular needs to tell the mapper what types of wikidata object
> should be referenced here and how a mapper can find the correct ID for
> a certain feature.

I suppose that was a rhetorical flourish on my part. What would be the 
best way to describe the role played by the brand:wikidata value in this 
hypothetical example:

name=Terminal 1 KFC
brand=KFC
brand:wikidata=Q524757
operator=ACME Airport Concessions

where Q524757 is the Wikidata entry for KFC the fast food chain? To find 
Q524757, I went to https://en.wikipedia.org/wiki/KFC and clicked on 
"Wikidata item". This is what I would've done to find the QID of the 
London Eye to put in the London Eye's wikidata tag, if not for iD's 
basic Wikidata integration.

[1] 
http://web.archive.org/web/20121216044005/http://toolserver.org:80/~mazder/multilingual-country-list/
[2] https://www.wikidata.org/wiki/Q1647331
[3] https://en.wikipedia.org/wiki/List_of_generation_I_Pokémon#Raichu
[4] https://www.wikidata.org/wiki/Help:Merge
[5] https://quarry.wmflabs.org/query/22125
[6] https://www.wikidata.org/wiki/Q152891#identifiers

-- 
minh at nguyen.cincinnati.oh.us




More information about the talk mailing list