[OSM-dev] Harmless edits (was: Change in wtfe.gryph.de "Quick History Service" API)
frederik at remote.org
Sat Dec 3 00:13:14 GMT 2011
I have finalized a script that can analyze an object's history and
determine if certain edits are "non-edits" (i.e. nothing of note was
changed at all), or "harmess" (i.e. the object was changed and might
have to be rolled back if the contributor does not agree to the license
change, but the rollback will likely not affect the quality much).
The idea behind this is to provide some help in prioritizing the
re-mapping effort. If someone who doesn't agree to the contributor terms
has made an important contribution then we want to re-map that soon; in
places where the same guy has just removed a few created_by tags we can
ignore that for now.
My analysis does not mean that something I classify as "harmless" will
not be reverted when the license change comes; it might well be. But if
it gets reverted, the consequences will be neglectable.
What I'm doing is basically look at the object history, identify each
contributor, and find out:
* have they made at least one "normal" contribution to the object -
added a node to a way, added or changed a tag, moved a node by more than
* if not, have they made at least one "harmless" contribution - removed
a tag, a node, or a member; moved a node by less than one metre?
* if not, then they are a "zero contributor" to that object.
We do indeed have a number of "zero contributors", from times where
different editors had different malfunctions - e.g. for a while, if you
did a "select all" in JOSM then removed a tag, all objects would be
marked as changed even if they did not contain the tag, and you would
appera in the object's edit history even though you never changed it. Or
Potlatch at some time used to mark a ways' member nodes as changed when
you changed the way.
(If an object is reverted to an earlier state, then all intermediate
edits count as "zero contributions" as well - they might have been
valuable but they are not part of the visible object any more.)
You can try out my script here, by adding a way/node/relation id to the
URL like so:
The output is a break-down of what my script thinks has happened to the
object, and which edits are zero-edits ("severity: 0") or harmless
("severity: 1"). After the version analysis, it summarizes the user
contributions - each user is afforded the highest severity of all his
The most important output of my script is if it finds that an object
that currently looks "tainted" because someone who does not agree to the
license change has touched it, is not really problematic at all because
the change in question was harmless.
This is the case in the above "way 40103577" example. The version
history contains an edit by non-agreeing user 263596, therefore the
whole object looks problematic. My script finds out that this edit is
simply a tag deletion, and because all other edits are by people who
have agreed to the license change, the object does not have to be a top
priority for remapping.
Everyone is invited to play with this script and see what happens. I
plan to make this the basis of the v2 WTFE service, meaning that in the
future editors will likely *not* highlight stuff that my script deems
Here's the - hacky, perly - source code: http://wtfe.gryph.de/harmless.pl
Please don't do mass evaluations with this web service, as it runs a
"history" query against the API in the backend and this is quite costly.
If you want to run this on a large area, download the .pl file and make
yourself a full history extract with Peter Koerner's history splitter,
then run the perl script on the XML. It can process anything up to the
complete planet if you have the patience.
Frederik Ramm ## eMail frederik at remote.org ## N49°00'09" E008°23'33"
More information about the dev