[OSM-talk] Code of conduct for automated (mass-) edits

Tue Sep 30 13:58:57 BST 2008

Sorry for breaking the email history, I just subscribed to this list.

Frederik Ramm wrote:
> Another issue is, *if* something is changed, *how* this is done. Lacking 
> 0.6's versioning, if anyone analyzes yesterday's planet file to find 
> ways he'd like to fix and uploads changed versions of each, chances are 
> he'll overwrite all those that have been changed between the generation 
> of the planet file and his script run. Whoever wants to run an automated 
> update should know exactly what he's doing, and be in a position to 
> exactly revert his changes should it turn out they were faulty.

When on the italian mailing list we recently agreed on the
capitalization convention for the street type (using Via, Viale etc
instead of via and viale, basically the same convention as explained on the
wiki about using the Street capitalization), the risks of doing the mass
edits prompted me to write a tool that reduced the risk as far as
possible with the current db API. The tool basically downloads the
latest version of each node/way from the db at the time of the upload
and it checks that the fields/tags that need changing have still the old
value before the change is applied. All the other tags are left
unchanged. This way the data overwrite race lasts just a fraction of a
second.

I suggest people use the same approach in their scripts or
simply use my code/tool. It can be download from:
	http://www.oddwiz.org/~lupus/osm/osm-helpers-0.3.tar.gz

I didn't announce it here yet because I don't have much time to deal
with a large user base, but this thread seemed like a good enough
reason.
People may also find the included osm-history tool useful
as it displays the changes in a db object in a diff-like way
(I always found it very hard to spot just the differences between
versions by looking at all the data shown on the web page).
Note the the tools should work on windows/OSX, but I only tested it with
Mono on Linux. Feel free to mail me any bug report (and always remember
to check the changes file before uploading it).

> And still another thing is documentation; I somewhat expect that any 
> automated, large-scale change should be documented. When was it done, 
> what exactly was done, how many objects were affected, what were the 
> "source" and/or username settings for the job so that it can be 
> identified later.

The tools that I wrote work this way:
1) a checker tool generates a changes file, this includes the object id
and the new/old values for the changed tags.
2) the changes file can be inspected for error/mistakes etc
3) another tool takes the changes file and updates the objects with the
protocol explained above.
4) the same upload tool can optionally notify the last contributor of
the changed objects: the message will include the object id and the
old/new values.

So both the changes file and the email notification serve as the
'documentation' of the changes.
More details are in the README file in the tarball.

> 1. Make a plan of what you want to change, and discuss in relevant forum 
> (usu. mailing list). If there are many objections; drop the plan. If 
> there are few objections, maybe exempt certain areas or objects created 
> by certain people in order to respect their objections. Remember that 
> they can easily change things back again if you act against their will, 
> so don't even try to play the superiority card.
> 2. Make sure your tools and knowledge are good: You have to be able to 
> revert your changes if something goes wrong, and you need to keep any 
> collateral damage to an absolute minimum. If you cannot guarantee that, 
> ask someone for help who can.

I think my osm-upload-changes tool involves the minimum amount of
overwriting risk as allowed by the db API. Also, separating the
checker and the upload tool allows (or favours) inspecting the
changes file for mistakes or unintended changes. The changes file
shows just the changes, so it an be easily read (vs looking at a huge
osm file with all the data, even the data that is unchanged).

> 4. Provide documentation that tells people what exactly you have done.

The wiki message notification in my tool properly documents the changes.
Keeping the changes file around also helps with that if there is any
issue in the future.

That said, I agree with the other points in the email.
I hope that my code will help people writing more responsible bots,
but I think some changes at the project level will be needed to better
deal with script usage (or simply with mistakes happening in GUI editors).

The first requirement, IMHO, is a better way to communicate with other
users. I would be in favor of sharing the email address of users
to make this easier: currently my tool has to parse the web pages
and submit web forms. There is no reason to make communication harder,
so let's allow sharing the email address to other (authenticated) users.
With proper communication all the changes to the objects could be
notified (the current web interface would be a nightmare with many
messages).

The other requirement would be an API to query the db for changes by
user and/or time range. I haven't looked at the db structure or server
code, so I have no idea how much work is involved for this. It is
something, though, that would be compatible with the 0.5 API as it is
a separate query and doesn't involve any incompat changes.

lupus

-- 
-----------------------------------------------------------------
lupus at debian.org                                     debian/rules
lupus at ximian.com                             Monkeys do it better