[OSM-talk] Fixing wiki* -> brand:wiki*

Andy Townsend ajt1047 at gmail.com
Thu Sep 28 13:01:13 UTC 2017


On 27/09/2017 17:14, Yuri Astrakhan wrote:
> * Problem #1:  In my analysis of OSM data, wikipedia tags quickly go 
> stale because they use Wikipedia page titles, and titles are 
> constantly renamed, deleted, and what's worse - old names are reused 
> for new meanings.  This is a fundamental problem with all Wikipedia 
> tags, such as wikipedia, brand:wikipedia, operator:wikipedia, etc, 
> that needs solving. The solution does not need to be perfect, it just 
> needs to be better than what we have.
>
> * Problem #2: the *meaning* of the "wikipedia" tag is ambiguous, and 
> therefor cannot be processed easily. The top three meanings I have 
> seen are:
>   a) This WP article is about this OSM feature (a so called 1:1 match, 
> e.g. city, famous building, ...)
>   b) This WP article is about some aspect of this OSM feature, like 
> its brand, tree species, or subject of the sculpture
>   c) Only a part of this WP article is about this OSM feature, e.g. a 
> WP list of museums in the area contains description of this museum.
>
> * Problem #3: data consumers need cleaner, more machine-processable 
> data. The text label is much more error prone than an ID:  McDonalds 
> vs mcdonalds vs McDonald's vs ..., so having "brand=mcdonalds" results 
> in many errors. Note that just because OSM default map skin may handle 
> some of them correctly, each data consumer has to re-implement that 
> logic, so the more ambiguous something is, the more likely it will 
> result in errors and data omissions.
>
> The brand:wikidata discussion is about #1, #2b, and #3.
>
> Are we in agreement that these are problems, or do you think none of 
> them need solving?

1)  Not a problem as such.  If something has changed on the wikipedia 
side then something may need checking on the OSM side.  It might be as 
simple as "someone's just renamed the wikipedia page" then fine just fix 
the link - but it needs a human to check it. What might have happened of 
course is that the object has changed in the real world (been renamed, 
moved, or changed in some other way) and the object in OSM needs a 
resurvey, or perhaps can be changed based on existing knowledge, but 
either way it still needs checking.

2b) If someone's added a wikipedia link to an OSM object that represents 
a tree to point to the wikipedia page of that type of tree, than that's 
not helpful.  There's no need for the link, since the tree type is 
already tagged in OSM.

3) This depends on the data consumer.  If you're simply trying to 
impress people with the volume of data that you have access to then you 
might indeed want an a large number of unmaintainable extra links of 
dubious provenance.  Realistically though in my experience (as I've 
written elsewhere in this thread) data consumers do care about the 
quality of the data that they're processing, and the fact that the 
person adding the object spelt "McDonald's" differently is something 
that they may well have a view about.

In a different context I've written elsewhere about the work that went 
in to create the list at 
https://github.com/SomeoneElseOSM/SomeoneElse-style/blob/master/style.lua#L1401 
which involved looking at how people tagged certain sorts of features in 
OSM.  Free tagging is both a strength and a weakness of OSM - without it 
the data wouldn't get captured at all, but with it people do have to 
look at the data that's been added - but it's what data consumers do 
already.  You could argue that a "brand:wikidata" key makes their job 
easier, but if they want to do a proper job it probably doesn't make a 
lot of difference.

Another example - I recently looked at the usage of "natural=fell" in 
OSM with a view to rendering it.  It surprised me that this query 
http://overpass-turbo.eu/s/s2q showed at least 3 different types of 
objects with the same OSM tag.  A data consumer can't assume that what 
they thought that something meant (perhaps after reading the OSM wiki) 
is what mappers actually do - they'll need to filter the data they're 
consuming based on actual OSM usage.  In the case of "brand:wikidata" 
they may want to filter out obviously bot-added values because there was 
no local knowledge of that data and go back to what other tags the 
mappers added (in the case of the Aldis discussed elsewhere I suspect 
that there will always enough info to say which is which in other tags 
or using geographic location).

Best Regards,

Andy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20170928/6fd64719/attachment.html>


More information about the talk mailing list