[Tagging] Part/whole confusion with Wikidata tag, and the need for enveloping parts into a whole

Michael Reichert osm-ml at michreichert.de
Wed Aug 8 16:20:04 UTC 2018


Hi peterkrauss,

Am 08.08.2018 um 03:18 schrieb Nelson A. de Oliveira:
> On Tue, Aug 7, 2018 at 9:22 PM, Yuri Astrakhan <yuriastrakhan at gmail.com> wrote:
>> Nelson, there are several places I have seen in our wiki, e.g. [1], which
>> discourage duplication of information if it can be avoided. name is a
>> special case - it helps mappers to quickly identify what the object
>> represents. If we duplicated everything, than each part of a railroad
>> station should have duplicate web site URL, hours of operation, operator
>> name, and tons of other info. Having duplicates lead to inconsistencies,
>> harder to maintain, etc.  For example, if two parts of the station have
>> different hours of operation - is that a mistake (someone forgot to update
>> both), or is it intentional? Which one of two is correct? Having a rule to
>> keep common info in a relation unless it is different makes data more
>> valuable and less error-prone.
> 
> I was talking about any object.
> And I fail to see what exactly is *wrong* in having multiple parts of
> an object with the same wikidata; it's not really duplication.
> 
> We don't create relations to avoid repeating surface, lanes, name, etc
> on every part of a highway, for example.
> Using relations also has the drawback of creating complexity for most
> of the users in OSM (and sometimes even for the data consumers),
> specially if the main objective here is to solely avoid non-unique
> wikidata values.

I second that.

Our data model is different from the data model of Wikidata.

It is common practice when mapping railway lines to add tags which refer
to a railway line (as a infrastructure, not the services using it) to
the ways.

Example: Tags of a way

railway=rail
name=Linke Rheinstrecke (English: Left Rhine Railway)
wikipedia=de:Linke Rheinstrecke
wikidata=Q…
operator=DB Netz AG

This duplication is common practice in OSM even if it does look wrong to
proponents of the pure doctrine how to design a database. OSM is not
modified by a frontend application hiding the database model from the
user. Instead, it is edited by humans how have to understand the
database model. Many of them already struggle with understanding
relations at all.

That's why we do not create a relation to collect all objects having the
same operator. Instead, we add operator=<name> to all the objects. Such
relations are called "collective relations" and not welcome.

https://wiki.openstreetmap.org/wiki/Relations/Relations_are_not_Categories

Route relations for railway lines (infrastructure, not the services) are
some of the exceptions from that rule.

OSM is a project with a lot of data. If we represented all much more of
the information which is currently "duplicated" as tags on individual
objects, processing of OSM data would become more difficult, expensive
and slower due to the many JOINs (I assume you are more familiar with
database terminology).

Best regards

Michael


-- 
Per E-Mail kommuniziere ich bevorzugt GPG-verschlüsselt. (Mailinglisten
ausgenommen)
I prefer GPG encryption of emails. (does not apply on mailing lists)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstreetmap.org/pipermail/tagging/attachments/20180808/641721ce/attachment.sig>


More information about the Tagging mailing list