[OSM-talk] Fixing wiki* -> brand:wiki*
ajt1047 at gmail.com
Thu Sep 28 13:01:13 UTC 2017
On 27/09/2017 17:14, Yuri Astrakhan wrote:
> * Problem #1: In my analysis of OSM data, wikipedia tags quickly go
> stale because they use Wikipedia page titles, and titles are
> constantly renamed, deleted, and what's worse - old names are reused
> for new meanings. This is a fundamental problem with all Wikipedia
> tags, such as wikipedia, brand:wikipedia, operator:wikipedia, etc,
> that needs solving. The solution does not need to be perfect, it just
> needs to be better than what we have.
> * Problem #2: the *meaning* of the "wikipedia" tag is ambiguous, and
> therefor cannot be processed easily. The top three meanings I have
> seen are:
> a) This WP article is about this OSM feature (a so called 1:1 match,
> e.g. city, famous building, ...)
> b) This WP article is about some aspect of this OSM feature, like
> its brand, tree species, or subject of the sculpture
> c) Only a part of this WP article is about this OSM feature, e.g. a
> WP list of museums in the area contains description of this museum.
> * Problem #3: data consumers need cleaner, more machine-processable
> data. The text label is much more error prone than an ID: McDonalds
> vs mcdonalds vs McDonald's vs ..., so having "brand=mcdonalds" results
> in many errors. Note that just because OSM default map skin may handle
> some of them correctly, each data consumer has to re-implement that
> logic, so the more ambiguous something is, the more likely it will
> result in errors and data omissions.
> The brand:wikidata discussion is about #1, #2b, and #3.
> Are we in agreement that these are problems, or do you think none of
> them need solving?
1) Not a problem as such. If something has changed on the wikipedia
side then something may need checking on the OSM side. It might be as
simple as "someone's just renamed the wikipedia page" then fine just fix
the link - but it needs a human to check it. What might have happened of
course is that the object has changed in the real world (been renamed,
moved, or changed in some other way) and the object in OSM needs a
resurvey, or perhaps can be changed based on existing knowledge, but
either way it still needs checking.
2b) If someone's added a wikipedia link to an OSM object that represents
a tree to point to the wikipedia page of that type of tree, than that's
not helpful. There's no need for the link, since the tree type is
already tagged in OSM.
3) This depends on the data consumer. If you're simply trying to
impress people with the volume of data that you have access to then you
might indeed want an a large number of unmaintainable extra links of
dubious provenance. Realistically though in my experience (as I've
written elsewhere in this thread) data consumers do care about the
quality of the data that they're processing, and the fact that the
person adding the object spelt "McDonald's" differently is something
that they may well have a view about.
In a different context I've written elsewhere about the work that went
in to create the list at
which involved looking at how people tagged certain sorts of features in
OSM. Free tagging is both a strength and a weakness of OSM - without it
the data wouldn't get captured at all, but with it people do have to
look at the data that's been added - but it's what data consumers do
already. You could argue that a "brand:wikidata" key makes their job
easier, but if they want to do a proper job it probably doesn't make a
lot of difference.
Another example - I recently looked at the usage of "natural=fell" in
OSM with a view to rendering it. It surprised me that this query
http://overpass-turbo.eu/s/s2q showed at least 3 different types of
objects with the same OSM tag. A data consumer can't assume that what
they thought that something meant (perhaps after reading the OSM wiki)
is what mappers actually do - they'll need to filter the data they're
consuming based on actual OSM usage. In the case of "brand:wikidata"
they may want to filter out obviously bot-added values because there was
no local knowledge of that data and go back to what other tags the
mappers added (in the case of the Aldis discussed elsewhere I suspect
that there will always enough info to say which is which in other tags
or using geographic location).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the talk