[Talk-it] Fwd: [Rebuild] Strategy change for the redaction bot

Simone Cortesi simone a cortesi.com
Sab 2 Giu 2012 20:24:29 BST


un aggiornamento sul cambio di licenza, e sul perche' non sia ancora avvenuto.


---------- Forwarded message ----------
From: Gnonthgol <gnonthgol at gmail.com>
Date: Sat, Jun 2, 2012 at 9:14 PM
Subject: [Rebuild] Strategy change for the redaction bot
To: rebuild at openstreetmap.org


The work with the license change bot have almost halted the last
month, with only around 10 commits and no real progress on the main
algorithms. This is partly because people have been occupied with
other stuff (work, life, mapping, final exams, etc) and partly because
the problems with the algorithms are really hard.

There are two problems that the progress have halted on. If we solve
these problems we might bump into new problems, or we just haveáto
make and testáthe glue between the bot and the database or history
files and we are done. The two hard unsolved problems we have now are:

1) Detecting wether a change to a tag is trivial enough that it can
not be copyrighted. This includes fixing typos, expanding
abbreviations and normalizing tags.
2) Removing changes in the geometry of the ways and relations.

I have written a bit on the problems and suggestions to make the
problems a bit easier.

TRIVIAL TAG CHANGE PROBLEM

The first problem is a difficult problem for computers to do. It needs
a lot of knowledge about the language used in the tags like how to
detect misplelings, abbrevs and words of order. In addition what is
trivial and significant changes to a tag can be subjective and may
start big debates after the redaction bot. It is impossible for any
human to know all types of trivial changes in all languages and it is
even harder for a computer program to get this right.

There are no way set the limits of what is trivial and significant
changes too far in one or the other direction as either way will cause
changes from non-acceptors in the final result. If we set too many
changes as significant a trivial change by an acceptor will clean the
tag and cause the information added by the non-acceptor to be included
in the final result. On the other hand if we set too many changes to
be trivial then significant changes by non-acceptors can be marked
clean and not be redacted.

What we can do is to make the algorithm give three results "trivial",
"significant" and "don't know". I the later case it will return a safe
default that will cause the bot to redact as much as possible. If we
do this we might loose more information that is strictly necessary but
we would not keep any data from non-acceptors.

GEOMETRY PROBLEM

The second problem we currently have is equally hard for computers.
Geometries can be totally changed by non-acceptors (like the 'reverse
way' and 'sort relation members' functions does) and it can be hard if
not impossible to apply the changes done by acceptors on a totally
different geometry. I have not seen any programs that can do this. The
closest program that does something similar (patch) can not handle
these cases and require human intervention if this is found.

If we get this wrong we will probably mangle geometries around the
world and make them look like vandals have been there and broken
everything (ok, it may not be that bad). We want the final result to
be good data, and not just the best license.

To make a better algorithm for this we need to look at the planet and
see to what extent this is an issue. A lot of relations do not matter
what order the members are in or we can do simple functions on the
data to determine what order it should be in. It may be so little
problems that some remapping efforts or that a human can do the
redactions on those objects. However if this is a big problem, we need
some really really smart algorithms that are able to read what a
mapper meant to do with this edit and apply it to the clean geometry.

I want to get some comment on the changes to the bot before I start to
redesign the bot. We still needs better algorithms for both problems,
but this might remove the requirement that the algorithm have to be
perfect. If we get enough people to work on the bot we can get this
thing done before people gets too restless to keep at bay.

Gnonthgol


_______________________________________________
Rebuild mailing list
Rebuild at openstreetmap.org
http://lists.openstreetmap.org/listinfo/rebuild



-- 
-S



Maggiori informazioni sulla lista Talk-it