[Tagging] Values in namespaces/prefixes/suffixes Considered Harmful - Or: Stop over-namespacing and prefix-fooling

Stefan Keller sfkeller at gmail.com
Thu Dec 27 01:04:08 UTC 2018


bkil,

On We, 26. Dez. 2018 at 20:05 bkil <bkil.hu+Aq at gmail.com> wrote in
thread "[Tagging] Feature Proposal - RFC - Top up":
> Stefan, I think most of us here do not fully understand your hard arguments, but if you could please elaborate a bit more or give some more examples, maybe we could better address your concerns.
> Anyway, this question sounds a bit orthogonal to the proposal at hand. Could anyone please link to a previous discussion with arguments in this topic? I'm absolutely sure

Thanks for asking. I'm opening a new thread here.

I'm also sure and I know that somebody took the time to write up a
matrix of pro's and con's of the options, but can't remember for now.

> Getting back to the proposal at hand, how would you map this place?
>
> top_up:phone:Vodafone=yes
> top_up:phone:Telekom=yes
> top_up:phone:Telenor=yes
> top_up:phone:Blue_mobile=no
> top_up:transport=yes
>
> Which one would cut it instead in your opinion?
> top_up:phone=Vodafone;Telekom;Telenor
> top_up:transport=yes

I can't follow what "transport" means here, but yes: the latter would
be one alternative.

This misuse of prefixes is called "over-namespacing" [1]. Read the
fifth sentence in the intro [1] and let me summarize the troubles of
over-namespacing:

1. it is bad to tag/key reuse and
2. it disseminates the data scheme.

Given the bad example of "service:bicycle:*" the value "retail" in
"service:bicycle:retail=yes" may be potentially used also in other
shops.

A proper scheme would be "service=*" (or at least
"bicycle_service=*"), having key "service" and values
"retail,repair,second_hand;...". I know that semicolons are not easy
to process, but:

Putting values in keys in the form "subkey:value1=yes/no;
subkey:value2=yes/no" instead of "subkey=value1;value2" is worse: it's
detrimental to any data management from capturing to analysis.

So, I tend to call this "prefix-fooling".

It's really turning processing of key-values (or key-value pairs KVP,
entity-attribute-values EAV, dictionnaries, associative arrays, map
collections, Hash stores/hstores) ad absurdum. In addition to the
troubles of over-namespacing mentioned above there are following
consequences of prefix-fooling - among others (sticking at the example
"service:bicycle:retail=yes;service:bicycle:repair=yes;"):

* Existing code to validate and cleanup values is in vain: One can't
check with usual functions if a value is in range
"retail,repair,second_hand".
* Existing code to match is in vain too: Prefix-fooled keys pretend to
have mixed cases (which they should'nt).
* Worse, users still extend "yes/no" values to arbitrary values (which
again makes processing unnecessarily complicated).
* Even worse, users are encouraged to invent new sparsely used keys
(which we can't prevent, but it's less harmful in the values).
* Source code is flooded by boolean expressions (which would else be a
single function) and need to be predefined in the code (instead of
being put in values).
* Values in namespaces/prefixes/suffixes are hard or impossible to
search, match, count or group in computer languages, including SQL.

This list is incomplete but let me exemplify the points above in
specific example [2] where's currently p.ex.
"service:bicycle:Mobiler_Servive=yes": Here someone at least has
written "yes" correctly, but he 1. used a mix of upper case and "_",
2. typed "Servive" wrongly and 3. invented some german key (which
would be OK to me if the user would have invented "service=Mobile").

As a consequence OSM Wiki tends to be flooded by Wiki-Key pages (one
"service:bicycle:retail", one for "service:bicycle:second_hand",
etc.), one can't search for partial key names (like "service") in
Wiki, Editor Presets [3] or Overpass (regex "*service*" is an error in
Overpass), etc...

To conclude: I consider values in namespaces/prefixes/suffixes harmful
and I hope this thread helps to avoid and correct over-namespacing and
prefix-fooling.

:Stefan

[1] https://wiki.openstreetmap.org/wiki/Namespace
[2] https://www.openstreetmap.org/node/4672943370
[3] https://2018.stateofthemap.org/2018/T061-An_excursion_in_to_the_world_of_OSM_tagging_presets/

Am Mi., 26. Dez. 2018 um 20:05 Uhr schrieb bkil <bkil.hu+Aq at gmail.com>:
>
> Stefan, I think most of us here do not fully understand your hard arguments, but if you could please elaborate a bit more or give some more examples, maybe we could better address your concerns. Anyway, this question sounds a bit orthogonal to the proposal at hand. Could anyone please link to a previous discussion with arguments in this topic? I'm absolutely sure that it comes up annually around here, but I'm newbie here, so I can't tell from the top of my head.
>
> On our local list, this argument usually comes up in the other way around: I usually want to endorse a way to use as much semicolons as possible to ease the work of mappers, while everyone else lists counter arguments that boolean alternatives are the upcoming norm.
>
> The socket key grew up from power_supply, check how they use that in taginfo. Consider the following example:
> socket:cee_17_blue=2
> socket:cee_7_3=yes
>
> It indicates that we have 2 sockets of the first, and they also have some of the second kind, but we don't know how many. Perhaps it came from an import, from memory, or there was simply not enough time to count them all. How else would you tag this?
>
> Getting back to the proposal at hand, how would you map this place?
>
> top_up:phone:Vodafone=yes
> top_up:phone:Telekom=yes
> top_up:phone:Telenor=yes
> top_up:phone:Blue_mobile=no
> top_up:transport=yes
>
> Which one would cut it instead in your opinion?
> top_up:phone=Vodafone;Telekom;Telenor
> top_up:transport=yes
>
> Or this one:
> top_up=phone:Vodafone;phone:Telekom;phone:Telenor;transport
>
> Or try to translate this example:
> top_up:phone=yes
> top_up:transport=yes
>
> Would it correspond to this?
> top_up=phone:transport
>
> Given proper presets & UI, a mapper simply ticks some boxes and be done with it - no typing needed. And anyway, I use the contact:* schema extensively and I do not feel that to slow me down - it's just a matter of learning to touch type or using proper autocompletion.
>
> From a performance perspective, if one has a Telenor card and wants to top up, geolocating a place is as simple as looking up top_up:phone:Telenor=yes in the granular case using a DB index/map (key-value based bigdata storage also shines here). If we crammed everything into the same top_up or top_up:phone field, we would need regexp lookups that are much less efficient. Although, if this was the only drawback, we would have the option to build an intermediate shadow database from the master OSM just for the purpose of efficient lookups (basically normalizing to the same form as seen above).
>
> Actually the best solution would be to combine the advantages of both. It would not be really difficult to come up with an editor in which you could enter top_up:phone=Telenor,Vodafone,Telekom and it would automatically expand to the above form on pressing enter (including the missing entries defaulting to *=no!). Namespacing has the added benefit of sorting the keys alphabetically putting them nearby (the same advantage for contact:*=*), although an interface could choose to compress these as they wish.
>
> Full disclosure: up to now, I was happy to use semicolons in cuisine=*, as I don't expect people to do lookups for these and there's sometimes a dozen of them, but this does cause sleepless night. Fortunately, editors support checkboxes for this field in this scheme.
>
> On Wed, Dec 26, 2018 at 6:03 PM Stefan Keller <sfkeller at gmail.com> wrote:
>>
>> Am Mi., 26. Dez. 2018 um 16:47 Uhr schrieb Martin Koppenhoefer
>> <dieterdreist at gmail.com>:
>> > For practical reason, I would expect a scheme
>> > characteristic_I_need_to_know=yes/no
>> >
>> > much easier to evaluate than one like:
>> > some_services=foo;characteristic_I_need_to_know;bar
>>
>> No it's not easier. The following
>> some_services_foo=yes/no
>> some_services_characteristic_I_need_to_know=yes/no
>> some_services_bar=yes/no
>>
>> is three times more to read and write for humans, as compared to
>> some_services=foo;characteristic_I_need_to_know;bar
>>
>> and - again:
>>
>> The form "detail:value:sub_value(:...)=?"
>> (1.) breaks fundamental(!) assumptions in OSM (assuming tags as a key
>> and value(s)).
>> And (2) it breaks programming principles (requiring a attribute-name
>> having value(s)).
>>
>> So it's obvious why the Wiki and taginfo and you name it can't cope
>> with it. I'm sorry, but it's hard to be more clear and explicit than
>> that.
>>
>> And I hope for OSM that it's not becoming common - even given there
>> are other bad examples like recycling or service:bicycle [1].
>>
>> :Stefan
>>
>> P.S. Note that it's the fact that there are alternatives especially to
>> the boolean yes/no/unkown case and that tagging schemes like "socket"
>> [2] is acceptable since it's still about a value in the key=value
>> pair.
>>
>> [1] https://taginfo.openstreetmap.org/search?q=service%3Abicycle
>> [2] https://wiki.openstreetmap.org/wiki/Key:socket
>>
>> Am Mi., 26. Dez. 2018 um 16:47 Uhr schrieb Martin Koppenhoefer
>> <dieterdreist at gmail.com>:
>> >
>> >
>> >
>> > sent from a phone
>> >
>> > > On 26. Dec 2018, at 15:08, Stefan Keller <sfkeller at gmail.com> wrote:
>> > >
>> > > Tag-proposals in the form
>> > > <tag_attr_name>:<type_value->[:<subtype_value>]=yes/no should be
>> > > avoided. It's shifting values to attribute names!
>> >
>> >
>> > it’s not a value, it‘s a property ;-)
>> > it depends on your interpretation, e.g. motorroad=yes
>> > oneway=yes
>> >
>> > aren’t these values and we should tag them
>> > road_restrictions=motorroad;oneway?
>> >
>> >
>> > top_up:phone=yes
>> > means: provides phone top up.
>> > For practical reason, I would expect a scheme
>> > characteristic_I_need_to_know=yes/no
>> >
>> > much easier to evaluate than one like:
>> > some_services=foo;characteristic_I_need_to_know;bar
>> >
>> >
>> > Cheers, Martin
>> > _______________________________________________
>> > Tagging mailing list
>> > Tagging at openstreetmap.org
>> > https://lists.openstreetmap.org/listinfo/tagging
>>
>> _______________________________________________
>> Tagging mailing list
>> Tagging at openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/tagging
>
> _______________________________________________
> Tagging mailing list
> Tagging at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging



More information about the Tagging mailing list