[Tagging] data model (was: Re: Feature Proposal - RFC - shop as post-partner)

Sat Feb 27 20:32:21 UTC 2021

Am 2021-02-27 um 18:55 schrieb Robin Burek:

>> Very true, same here, but that shoudn't stop you from using different
>> nodes which are closely mapped to each other. When someone searches
>> for a specific service they will show up on the same screen. It makes
>> it also a feasible concept and easy to understand for beginner mappers.
>
> Than it is mapping for the randerer. You create extra nodes to show
> multiple services of one shop/amenity etc. to display it on the map.
> And really - I don't understand it after years. Why should I map an shop
> four or five times?!?! There is only ONE SHOP.

Go one abstraction level up from that and you will see: The data model
is very naive, it follows a KISS principle - i.e. there is a relatively
small amount of keys like shop, amenity or office and these can have
only exactly one value - which makes it easy for mappers and tool
developers, but at the cost of strongly limiting multiple information
attached to one OSM object (e.g. two types of offices are impossible),
thus "seducing" to create multiple OSM objects for the very same
physical object (so in the DE post partner scenario, but not the UK
scenario, where also multiple physical objects exist according to Paul).

That said it all - the rest of this mail is just refining/detailing what
was already said.

This seducing power becomes much stronger as organizational procedures &
structures make it very very very tedious to reach an agreed improved
tagging. Even for stuff that appears simple at first, like f.i. post
partners: Everybody agrees these objects are desired in the OSM data
base, everybody agrees those partners are no post offices in the
traditional sense so existing & well defined tag amenity=post_office
shall not be used, everybody agrees the offered services vary greatly so
they must be explicitly tagged, everybody sees one shop may partner with
multiple operators so these need to be explicitly tagged,... So a lot of
agreement & consent. Still, we are unable to agree on a proper solution
(=feasible and globally usable and mostly clean in a logical respect)
since years. Yes, years, because
https://wiki.openstreetmap.org/wiki/Proposed_features/shop_as_post-partner
is just the most recent try, IMHO a quite mature + feasible one. Yet,
you can read in this thread how strong "stick to what we have even
though we all see it's not good!" is.

This seducing power was not reduced much by the introduction of means to
make ambigous tags distinct, i.e. prefixes & suffixes. In high level
theory, they allow to tell apart e.g. the tag "operator" for the
kiosk/supermarket/... as such, the postal partners, the lottery partners
etc. but on hands-on operational level, we have no _clear consensus_ how
_exactly_ this prefixing/suffixing shall be done (also see e.g. tags
"source" or "check_date), i.e. via a prefix or a suffix, how to name the
prefix/suffix, how values shall exactly look like, and additionally,
there may be no prefix/suffix at all in some cases but still the same
meaning. This means we have a high amount of variance, and this is bad
(see below for details, e.g. they limit the tooling support).

Seeing this, it is not really suprising that people did create multiple
OSM objects for one physical object and will continue to do so - and if
only due to the lack of accepted alternatives combined with a missing
will to invest the energy and many hours to stand up for a better
tagging scheme.

What to do with this insight? How to face it?

We introduced means to help mappers being more consistent, e.g. by wiki
pages and name-suggestion-index (abbreviated nsi). This "just" makes it
easier to work with the naive data model, but does not address it's
limitations.

As a hack to keep the naive data model but overcome some limitations, we
introduced means to "group" related suff, e.g. relations or unique
wikidata IDs, but they do not really solve the iusse. They cause even
more tagging alternatives to exist in parallel, i.e. a higher amount of
variances, and this is bad:
* It is quite challenging to properly document. Many information
fragments scattered across many OSM wiki pages and other web pages. We
need to tell when to use what. We need to keep everything compatible.
When one fragment updates all others need to be checked/updated
* It makes it difficult to understand. Honestly, I am a long time mapper
and I have studied quite complex IT stuff, but still, I never _really_
understood the route relations. And looking e.g. at public transport or
piste relations, it seems I am not the only one that does not master
them because quite a lot of them are incomplete, broken, inconsistent
etc. My personal top reson is inconsistent documentation.
* It makes it difficult to implement tools: How shall you correctly
implement a concept you did not really deeply understand? The tools need
context awareness. They need many nested if-then-else etc.
* It causes the actual data set to also contain data sticking to
outdated rules, making all of the above even more difficult
(documentation needs to fit even more variances, understanding by
examples becomes more difficult, tool support becomes more difficult)

My conclusion: As long as we do have the very naive data model _and_
either no will or no organizational means to efficiently transit
stepwise into to a tagging scheme that allows to map two information of
one kind
(https://wiki.openstreetmap.org/wiki/Proposed_features/shop_as_post-partner
is exactly one such step as it can be combined with any
office/amenity/shop values in only 1 OSM object), the best we can do is
creating several nodes and group them by
https://wiki.openstreetmap.org/wiki/Relations/Proposed/Site (a more
general-purpose approach) or by spatial relation (node within polygon,
so Bert's primary suggestion).

Greetings, Georg