[OSM-dev] Harmless edits (was: Change in wtfe.gryph.de "Quick History Service" API)

Frederik Ramm frederik at remote.org
Sat Dec 3 00:13:14 GMT 2011


Hi,

    I have finalized a script that can analyze an object's history and 
determine if certain edits are "non-edits" (i.e. nothing of note was 
changed at all), or "harmess" (i.e. the object was changed and might 
have to be rolled back if the contributor does not agree to the license 
change, but the rollback will likely not affect the quality much).

The idea behind this is to provide some help in prioritizing the 
re-mapping effort. If someone who doesn't agree to the contributor terms 
has made an important contribution then we want to re-map that soon; in 
places where the same guy has just removed a few created_by tags we can 
ignore that for now.

My analysis does not mean that something I classify as "harmless" will 
not be reverted when the license change comes; it might well be. But if 
it gets reverted, the consequences will be neglectable.

What I'm doing is basically look at the object history, identify each 
contributor, and find out:

* have they made at least one "normal" contribution to the object - 
added a node to a way, added or changed a tag, moved a node by more than 
one metre?

* if not, have they made at least one "harmless" contribution - removed 
a tag, a node, or a member; moved a node by less than one metre?

* if not, then they are a "zero contributor" to that object.

We do indeed have a number of "zero contributors", from times where 
different editors had different malfunctions - e.g. for a while, if you 
did a "select all" in JOSM then removed a tag, all objects would be 
marked as changed even if they did not contain the tag, and you would 
appera in the object's edit history even though you never changed it. Or 
Potlatch at some time used to mark a ways' member nodes as changed when 
you changed the way.

(If an object is reverted to an earlier state, then all intermediate 
edits count as "zero contributions" as well - they might have been 
valuable but they are not part of the visible object any more.)

You can try out my script here, by adding a way/node/relation id to the 
URL like so:

http://wtfe.gryph.de/harmless/way/40103577

The output is a break-down of what my script thinks has happened to the 
object, and which edits are zero-edits ("severity: 0") or harmless 
("severity: 1"). After the version analysis, it summarizes the user 
contributions - each user is afforded the highest severity of all his 
changes.

The most important output of my script is if it finds that an object 
that currently looks "tainted" because someone who does not agree to the 
license change has touched it, is not really problematic at all because 
the change in question was harmless.

This is the case in the above "way 40103577" example. The version 
history contains an edit by non-agreeing user 263596, therefore the 
whole object looks problematic. My script finds out that this edit is 
simply a tag deletion, and because all other edits are by people who 
have agreed to the license change, the object does not have to be a top 
priority for remapping.

Everyone is invited to play with this script and see what happens. I 
plan to make this the basis of the v2 WTFE service, meaning that in the 
future editors will likely *not* highlight stuff that my script deems 
harmless.

Here's the - hacky, perly - source code: http://wtfe.gryph.de/harmless.pl

Please don't do mass evaluations with this web service, as it runs a 
"history" query against the API in the backend and this is quite costly. 
If you want to run this on a large area, download the .pl file and make 
yourself a full history extract with Peter Koerner's history splitter, 
then run the perl script on the XML. It can process anything up to the 
complete planet if you have the patience.

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00'09" E008°23'33"



More information about the dev mailing list