[Tagging] Data redundancy with "ref" tag on ways vs relations

Peter Wendorff wendorff at uni-paderborn.de
Mon Jul 30 22:05:46 BST 2012


Am 30.07.2012 20:11, schrieb "Petr Morávek [Xificurk]":
> Hi Peter,
>
> Peter Wendorff wrote:
>> I think, this would lead to a situation where the error count doesn't
>> decrease, but the remaining errors aren't detectable any more.
>>
>> Having refs only on relations means for a data consumer: I have to use
>> this data and I have no idea if it's correct - I have to assume it is to
>> use it.
>> Same for refs only on ways.
> This is a bit absurd argument. We should _not_ duplicate the data
> between relations and their members just that we could cross-check them.
> Almost any data duplication is wrong, because it's harder to keep the
> data synchronized, and thus it leads to more errors.
Yes, I realized that I was a little bit fuzzy here.
I'm not talking about data duplication in the meaning of "I add my data 
twice in different ways", but about redundant (not duplicate) data in 
the meaning of "Sven added his data there not nowing that it's possible 
here too; I add the data here - and you can check if we both contributed 
data that doesn't show failures."
> If I create/modify a route relation, it's soo much fun to copy the ref
> tags from the relation to ways. I'm a lazy person, so if someone tells
> me that this is what I'm supposed to do, I'll just write script for
> automating this enjoyable task (effectively canceling the benefit of
> data duplication for cross-checks).
No, you are not supposed to do - and I didn't say that.
In contrast that's exactly the way the heuristical approach to find 
errors would be less productive if you did.
Nobody forbids to do that, sure; and I'm fine with anyone deciding to do 
(as a mapper) only refs on ways or only refs on relations.
But I oppose to decide that one of these should not be done any more - 
because I don't see the benefit in it.

If you create a route relation and add a ref there, that's fine. It's 
correct (as long as you provide correct data of course), and it can be 
used by data consumers.
If Emil draws his ways and adds a ref tag to it, that's fine too - it's 
correct (...) and can be used by data consumers.
Neither you nor Emil did wrong stuff, and even if we afterwards have the 
ref on both, that's fine - as explained before.

You (may) complain that now it's hard to "fix" a bug in it.
Sure: if the routes ref get's changed, anyone has to fix that both in 
ways and in the relation probably; but if not, we have a contradiction 
that at least can be found in QA tools; and this kind of doesn't happen 
very often usually, so it's not that much work later.
>> refs on both means: I am free to use this or that
> Wrong. You cannot use either, because as you wrote below - you don't
> know for sure which of the values is correct.
I don't know wich is correct - I neither know if any of the values is 
correct, but that's not different from before.
Now I know that contradicting values cannot both be correct - before I 
didn't know and claimed them to be correct because of lacking alternatives.

What's better: to know that something is wrong, or to believe that 
something is right as long as nobody claimed otherwise?
If we come to a point where most data is contradicting as soon as it's 
mapped twice, we have a much bigger problem with OSM as a whole, because 
then it's roulette everywhere our data should be used.

regards
Peter



More information about the Tagging mailing list