[OSM-talk] Automated edits code of conduct

Sun Jul 17 22:01:29 UTC 2016

On 14/07/2016 11:14, Éric Gillet wrote:
> tuxayo:
>>     A fourth approach to fix that would be to have a first automated edit
>>     changeset and then a manual fix changeset for the other errors.
>>     A variant would be to reverse the order: fix the other errors first when
>>     inspecting the selected/searched objects to be automatically edited. And
>>     then doing the automated edit.
> 
> 
> That would be slightly faster to execute than the first approach I was
> suggesting, but then how would you prove that you checked every and all
> features ?

What do you mean by check?
In the approaches where one would also fix other errors, there would
also be no obvious errors on the automated part. Because checking for
other errors would imply a basic check that the automated edit is also good.
A further level of checking would be on ground, but we are talking about
automated edit. And we are already to some extend not anymore doing an
automated edit.
That's why I'm not sure what you meant by checking.

-----

On 14/07/2016 13:36, Warin wrote:
> One could upload the data for each feature - one changeset = one
> feature. That would at least show a time lag between each. Of course
> this would impose a larger load on both the contributor and the OSM web
> connection, but if that would avoid the continued accusation of
> 'mechanical edit' then so be it.

The point is how to still be able to perform automated edits, because
there are small, simple fixes that need to be done by hundreds/thousands
if we want to hope preventing such errors to accumulate in the base.
(e.g. color=* → colour=*)

-----

On 14/07/2016 14:32, Andy Townsend wrote:
> My first changeset discussion comment on suspicious edits is often
> "you've changed X to Y, but to me it looks like a Z; are you sure?" for
> exactly that reason.  If I can pick somewhere that I'm familiar with as
> an example, I'll use that.  If someone doesn't answer the question and
> instead replies "But the wiki says ..." then clearly we've got a problem.

So that would be a case where the wiki would say that X *must* be an Y
but where it's possible that it's a Z.
I'm not sure I understand the situation, that would be a wiki
interpretation error from the other contributor right?

-----

On 14/07/2016 17:19, Éric Gillet wrote:
> However I'd believe that there is (in Europe for the example's sake) a
> very low number of restaurant really named McDonalds and not part of the
> franchise. So if the changeset correct 300 restaurants but 2 are
> "damaged" by the automated edit, would the edit be bad enough to be
> reverted or not be done in the first place ?

If I understand correctly it's a particular example of the more generic
case:
We know in advance that we'll introduce errors to a very small
percentage of the data (here due to the variability and surprises with
the names).
But on more than 99.99% of the data, an error will be fixed. Is this an
acceptable trade-off?

For the generic case I don't know but for the restaurant name example I
think it's a bad idea. Because it's likely that the few restaurant that
are really named McDonalds will get automatically edited and reverted
multiple times.

-----

On 14/07/2016 17:35, Richard Fairhurst wrote:
> The answer to which is, of course, it depends. For some automated edits the
> collateral damage will be too great, for others it may sometimes be
> acceptable. 

> The person proposing the automated edit isn't the best placed
> person to weigh that up: they're already convinced of the desirability of
> the edit (which is why they're proposing it).

The person already weighed that up to decide that the benefit were worth
many hours in preparation, discussion, execution that will possibly end
up reverted.

I do agree that it isn't enough alone but that's opinion worth as much
as another.

> So we need a second opinion - people to review the edit to see whether the
> collateral damage will be too great. Since OSM is a classic example of "with
> many eyeballs, all bugs are shallow"
> (https://en.wikipedia.org/wiki/Linus%27s_Law), the challenge is to make sure
> enough eyeballs look at the proposed automated edit to see if there are any
> bugs in it.
>
> To ensure this, those proposing an automated edit need to put it in front of
> people's eyeballs. There are good ways to do that - particularly these
> mailing lists.

Agreed, there can't be a such general rule besides "discuss your plans"
even if the problem of what to do when there is not 100% consensus
remains a big concern.

-----

On 14/07/2016 17:38, Michael Reichert wrote:
> A mechanical edit must not cause damage. Therefore a mechanical edit
> which has damaged some data (damage > 0) should be reverted.

It can't be this simple. All these choice aim to have *better data on
the long run*. So there are some trade-offs that could be acceptable
(case by case, discussed with the community (or at least the part which
participates to ML or forums)).

If the edit was discussed and approved, then if after the fact, damage
that was considered acceptable is discovered. Or damage that doesn't
question the validity of the whole changeset (risk of many more damage
unnoticed).
Then the changeset shouldn't be systematically reverted.

-----

On 14/07/2016 18:03, Andy Townsend wrote:
> On 14/07/2016 16:19, Éric Gillet wrote:
>> So if the changeset correct 300 restaurants but 2 are "damaged" by the
>> automated edit, would the edit be bad enough to be reverted or not be
>> done in the first place ?
>
> I'd revert it.  It's essentially the same as the "trees" example
> upthread

Not if the edit was discussed. And don't worry, there is no way that
such an edit with names will get a consensus. ^^

> You might argue "but surely if more data is corrected than damaged the
> overall quality is improved?" but you'd be wrong.  It's important to
> leave as much original data there as possible for downstream
> processing.

I agree that we can't always answer yes to that question (more data is
corrected than damaged so it's acceptable).
But we can always say that (damage > 0) is unacceptable?
Isn't there a single example of something that should be fixed but we
know that maybe insignificant errors can happen? By insignificant I mean
that we can agree that it's not a problem compared to the rest fixed
which is data *already* damaged.

Cheers,
-- 
tuxayo