[Tagging] Data redundancy with "ref" tag on ways vs relations
"Petr Morávek [Xificurk]"
xificurk at gmail.com
Mon Jul 30 23:08:29 BST 2012
Peter Wendorff wrote:
> I'm not talking about data duplication in the meaning of "I add my data
> twice in different ways", but about redundant (not duplicate) data in
> the meaning of "Sven added his data there not nowing that it's possible
> here too; I add the data here - and you can check if we both contributed
> data that doesn't show failures."
OK, but this all still rests on the assumption that there are in fact
two independent data sources. I really don't think this is happening in
There are basically 3 scenarios how you can get the ref tags out of sync:
1) Someone creates a relation with ref=42 and then add a way with
ref=24, why would he do it? Imho there are two possibilities:
a) A mistake during editing - if the road really does not belong
there, then a QA tool analyzing roads, should find it relatively easy
(and such a tool would find e.g. a building polygon added to the route
relation as well <- THIS, you can't do with simple ref cross-check).
b) It is correct and has some meaning, that I can't think of right
now. (simple ref cross-check fails again)
2) A relation exists with member ways without ref tag. This means that
the route is essentially mapped and any further editor is correcting
errors, that he found. Then someone comes and adds a ref tag to one of
the ways - why?
a) He wanted to correct a wrong ref tag. Well, then I think that
person would/should look for the source of that wrong value (the
relation) and correct it. I think this scenario is highly unlikely.
b) Same as 1b). (cross-check again fails)
3) Both relation and ways are populated with ref tags and someone who
wanted to correct a wrong value (e.g. because it's changed) edited only
one of them.
Could somebody provide a scenario where the data duplication and simple
way-relation cross-check of ref tags is really useful? So far, I can't
> If you create a route relation and add a ref there, that's fine. It's
> correct (as long as you provide correct data of course), and it can be
> used by data consumers.
> If Emil draws his ways and adds a ref tag to it, that's fine too - it's
> correct (...) and can be used by data consumers.
> Neither you nor Emil did wrong stuff, and even if we afterwards have the
> ref on both, that's fine - as explained before.
Oh, OK... Let me clarify my position as well: I do not propose some mass
edit that would wipe out one way of tagging in favor of the other right
now. But I do think, that we should reach some consensus about the
desired final state of things and encourage data producers/consumers to
converge on it.
E.g. as Volker Schmidt wrote (wrt hiking routes), it's OK to use ref tag
on ways, but it doesn't make much sense to keep it there once the
relation is created and maintained.
> You (may) complain that now it's hard to "fix" a bug in it.
> Sure: if the routes ref get's changed, anyone has to fix that both in
> ways and in the relation probably; but if not, we have a contradiction
> that at least can be found in QA tools;
And this contradiction is clearly a negative side effect of data
duplication, because without the duplication this bug would never occur.
Please note, that the duplication of ref tags on relation+ways will
never alert you about the ref change in real-world. So, in this use case
the data duplication has only negative effect on data quality.
Once you've found the no longer valid ref tag, in the case of duplicated
data you must change the relation and all the member ways, which is
error-prone boring task. On the other hand, if you keep the ref only on
the relation, it's an easy fix.
Petr Morávek aka Xificurk
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 262 bytes
Desc: OpenPGP digital signature
More information about the Tagging