[Tagging] Data redundancy with "ref" tag on ways vs relations

Tue Jul 31 14:11:26 BST 2012

Hello,

first of I'm sorry for a bit longer mail, but this is just another
example of what gets me worried about the future of OSM.

This thread is another one of those, where someone came to discuss a
specific problem and proposed a solution, a solution that changes a few
old things. I fear that it will end as usual - a lot of mails back and
forth, and in the end no real answer to the initial problem.

What worries me is that very often in threads like this, two "arguments"
and their variations against the change come up.
1) You are a bad, because you try to impose your preferences on others.
2) Relations are complex, we should not use them.

Now, I'll try to explain, why I think that these are not valid arguments
for the discussed topic and should not be used.

ad 1)
It's not imposing, it's called laying an argument for the proposal.
If I (or someone else... I'll stick with the first person from now on)
wanted to impose the change on others I would go ahead and apply it
wherever I edit right away without discussion.
Instead of saying "don't impose your views on others", you should
provide an argument why the proposal is bad and ideally, propose
alternative solution to the presented problem. This way, I can react
with counter-argument, or admit that the original proposal was bad, and
after few iterations a real solution can be reached.

ad 2)
This is actually not an argument against any tagging proposal, but
argument for improving relation handling in editors. If we use it to
dismiss all tagging proposals, then we're moving in circles - relations
won't be used for tagging, even though (in some cases) they provide more
flexibility, less error prone and more effective tagging scheme, etc.
And editors will never improve the relation editing, because it's not
important, because they are not that much used.
I don't think the relations are that complicated as some people say, but
this is mainly influenced by the fact, that I started editing using
Merkaartor  (and sticked to it), which has pretty good (although far
from perfect) relations visualization and handling. On the other side of
the spectrum is Potlach, which makes anything involving relations overly
complicated. I've fixed my share of relation bugs, that I dare to say
came from these poor editing capabilities. I don't want to step on
someone's toes by this claim. Please, don't get me wrong, Potlach is
great editor for newbies and even I use it when I want to quickly fix
some minor bug, but the relation handling is its obvious weakness.

Now, to the question whether the consistency of data is important or
not. I think it is and here is why:

1) If by consistency you mean "not contradicting" data, I think it's
obvious. We should really aim for minimizing the amount of contradicting
data, because such data is (in most cases) simply useless.
I don't see any benefit in supporting the creation of contradicting data
- If someone sees an error, he should try to fix it, or at least add
fixme tag saying something like "I think this is wrong, because XXX."
and let someone else fix it instead.

2) If by consistency you mean "doing the same thing the same way across
the world", then I would argue that this is generally a good thing as
well. Of course, there are some cultural or local differences between
regions, and thus the total world consistency is probably impossible.
But at least on the most general level, we should try to be as
consistent as possible.
Such an approach makes things a lot easier for data consumers, so that
they don't have to track down all the various tagging schemes for one
feature and try to compile it to a consistent presentable result.
Things get easier for data producers as well - If a newbie comes with
question "How should I tag this feature?", you can give him a clear
answer, instead of "Well, it's a kind of mess, you can do it like this,
or that, or even that, and some people tag it like this.". (And this is
not a hypotethical scenario, I've seen such examples e.g. in our local
talk-cz.) And if the newbie comes with arguments why the suggested way,
doesn't fit his needs, we can discuss how to improve/change it.
The consistency, is tool for maximizing the usefulness of the data, e.g.
that's why in Czech Republic we don't use obchod=pekařství, but shop=bakery.

Furthermore, please, don't dismiss the proposals just because "this can
never work across all regions". Even in such a case it might be useful
to discuss it, because maybe somewhere else in the world some other
local community tries to solve the same problem. It's not a bad thing if
those two communities exchange their views and proposed solutions and
try to reach a solution that can be applied in both regions.

I don't think it's healthy for OSM, to support the view that local
communities should play on their own backyard and right away dismiss any
attempts for generalization of their solution to other parts of the world.
Unfortunately, this is exactly the impression I'm getting from numerous
threads in talk@ or tagging at .

The problem of roads tagging, was brought up in talk-cz several times.
The problem is that current tagging scheme is semantically wrong - e.g.
we have only one primary road number 2, but OSM data says we have
several hundreds of them. The same for named residential streets in
cities. This causes several problems.
It makes it hard for data producers to edit the road, because you have
the information about it duplicated over several hundreds of segments.
It makes it hard for data consumers to present the data in a meaningful
way - e.g. if you split the way, because you want to mark a bridge over
a river, afaik all renderers will draw several identical ref shields
close to each other, because they see it as three separate highways, not
one (as you would like to). And a bonus problem is how to mark a
reference number of the bridge? Similar problem is with named
residential streets in cities.
The outcome of the discussion was basically that the roads with
reference numbers and names are abstract features and it would be better
to map them that way. And the individual segments should contain only
the information like 'this is bridge', 'here the road has 3 lanes', ...
This issue was indirectly addressed by
http://wiki.openstreetmap.org/wiki/Relations/Proposed/Group_Relation
which basically failed, probably because it was too general proposal.
So, we put the idea of abstracting the roads to rest (again).

When I see this thread (and others like this) and all the resistance
(with little arguments) that any proposed change causes at global OSM
level, I'm starting to think that we (in Czech Republic and other
communities as well) should simply go ahead and play by our own rules at
our own backyard and just ignore the global consistency. And this makes
me sad, because this would lead to globally less useful data.

Best regards,
Petr Morávek

PS: Although I'm responding to Frederik's mail (because it provoked me
to this long email), it's not meant neither exclusively, nor primarily
for him.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstreetmap.org/pipermail/tagging/attachments/20120731/33fd8ff3/attachment.pgp>