[Tagging] Data redundancy with "ref" tag on ways vs relations

Mon Jul 30 20:57:58 BST 2012

Tobias Knerr wrote:
> If two instances are created at least somewhat independently*

This is a really bold assumption. I'm having a hard time to imagine a
real-life scenario, where this is true.

On the other hand, I can imagine scenarios where the cross-check will
fail simply, because someone who edited way, forgot to edit the relation
as well and vice versa.

> However, at this point we can begin to use automated error checking. The
> idea is that errors that can be found automatically are much more
> acceptable than those that cannot.
> 
> With only one instance of the data, none of the errors can found
> automatically.

You can spot a lot of errors just by doing a simple analysis of the
route graph - Are individual segments continuous? Is the resulting route
a simple linear feature? ...Yes, it's not 100% accurate, but the
alternative (data duplication + cross-checks) is neither.
By this you can catch most of the important errors and don't have to
rely on duplicated data.
I think it's better to spend some time in developing more sophisticated
QA tools, then to waste it on data duplication.

--
Actually, we have talked about this issue in talk-cz (Czech Republic)
recently. One guy made a simple analysis tool for finding "holes" in our
road network left by the redaction bot - the tools simply collected all
ways with e.g. highway=primary+ref=## and run some checks on them.
Consequently, the question why do we add the ref tag to every single way
was raised and that it would be a good idea to move it to some parent
relation. AFAIK, we don't use (m)any route relations in our road network
yet.

Best regards,
Petr Morávek

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstreetmap.org/pipermail/tagging/attachments/20120730/daa48fcf/attachment.pgp>