[Tagging] General tagging system problems

Daniel Koć daniel at xn--ko-wla.pl
Wed May 13 12:06:25 UTC 2015


W dniu 13.05.2015 10:46, Martin Koppenhoefer napisał(a):

> I'm not convinced that such a generic approach will help to get
> unambiguous tagging, e.g.

Preface first (sorry, just jump to the *** section if not interested =} 
)...

One thing is sure for me - there's no way to avoid ambiguity when coping 
with the whole planet, we just have many options how to handle it. Key 
strategies are:

1. You can have a "phone book"/"bible" with ready-made recipes.
2. You can let the people pick some things as they like it.

and currently we have 3. - we let people tag as they like, then we try 
to make a phone book filled with it.

1. is good if only you know all the cases beforehand - and that is 
clearly not the case in OSM. 2. is good if you start from the scratch 
and try to assume nothing - but that gives you no uniformity and that is 
also not the case anymore, because we already have a big database.

3. is a middle ground and sounds rather good, and it was probably useful 
to some extent, but it doesn't scale now, because it's still incomplete, 
but also overlapping.

What could we do then? 4. Reimplementation. This means we take the 
knowledge from the database and try to describe it in a different 
(hopefully better and more efficient) way.

Of course we can make it 1. - new, shiny and complete (finally!) book of 
tags, but we're still growing in terms of things we cover and that would 
be premature and too rigid, I guess.

Better to find the common threads (more basic objects/ideas) and then 
look how they relate to each other, so we have a tree of such basic 
things. Moreover, as we know that we're working with conventions and 
definitions, not the real objects, we can conventionally say that 
well-known complex objects are mapped 1:1 to some specific combinations 
- and that won't be hard, since we extracted basic "bricks" exactly from 
them!

***

Now, back to your question: so what about those well-known objects?

Well, we know that food+service may be restaurant or fast food or some 
other such place. We know what they have in common, but we need to know 
what's the difference. If we think it's payment time, we can just add it 
and voilà:

food + service + payment:before = fast food
food + service + payment:after = restaurant

This is the same we have in current definition, but the difference is we 
take the responsibility (to stick to complicated and not always 
exclusive definition) away from the mapper and move it more to 
tag/definition crafting dept., which is more focused, competent and sees 
the whole picture.

Sure, it's still a problem, because we need to really know what is the 
difference between the objects, but that's exactly our job here (at the 
tagging list and on the wiki) to find such things. The mappers should 
just describe the simple ground truth - they can see that it's shop + 
service + food, and we have to decide how to treat that thing; they can 
also be not sure and let it be just "food", to not fake it with random, 
misguiding "fast food/restaurant" picks (as they surely do now!).

I think it's like constructing genetic tree of birds - it may sound 
strange that falcons and eagles are not as close as it looks in "common 
sense" classification, but it lets us follow hidden reality better and 
the benefit is we can easier identify where some new species belong on 
that tree:

http://www.scientificamerican.com/article/graphic-science-the-bird-family-tree-gets-a-makeover/

> There's a reason we don't use just a few words in our natural language
> but rather a plethora of specific words plus grammar (word order etc.)
> to express ourselves. If a few words would suffice we wouldn't have
> developed all those words ;-)

I agree 100% with you! But I think that our words are too long and 
inflexible. We try to impose mappers to remember the whole proper 
sentences, just because it sounds natural. I'd like to chop the words 
into easy set of (sometimes strange) pieces and let people recombine 
them. By the way, we think the words are the most basic thing in 
language, but look at this example:

inflexible = not + able + (to be) flex

It sounds natural also, though that's not always the case. But the rule 
is "if we know the definition well, we could spell the same thing with 
more basic elements", so we can look at the wiki:

supermarket = (A) large + store (for) + groceries (and other goods.)

That gives us "large + shop + groceries" as a simple, conventional 
definition of supermarket (shop=supermarket was a try to have hard 
categorization implied, but that worked only on smaller scale).

"Supermarket" is much more handy name, so we can still use it (of 
course!), but once we have clear picture what it really is, we can 
compare it directly with other shops (like: small + shop + groceries = 
convenience) and let people create tagging for new kinds of shops easier 
and more coherent at the same time.

-- 
"The train is always on time / The trick is to be ready to put your bags 
down" [A. Cohen]



More information about the Tagging mailing list