[OSM-legal-talk] decision removing data

Frederik Ramm frederik at remote.org
Wed Aug 4 14:40:37 BST 2010


Hi,

80n wrote:
> This quickly gets quite complex when factored across multiple 
> generations of way splits.

You're right, let's just ignore way splits altogether then ;)

> Changesets are a relatively recent invention.  Edits prior to the 
> introduction of changesets don't have any formal grouping so this 
> approach will not work for old data.

When changesets were introduced, changesets have been synthesized for 
all old editing sessions according to a reasonably clever scheme that 
should indeed catch any split ways.

> Even older data that was converted from segments will have no history at 
> all because it was discarded. 

The amount of data that existed back then was relatively small. It is 
trivial to find out the list of contributors for each way that still 
exists today. (The database still lives on a backup.) Again, there may 
be fringe cases that get overlooked which we will then fix if someone 
complains, but on the whole I think this is manageable.

(Thing I'm more concerned about is what happened when users changed 
their names; back then we only recorded names not IDs so it might be 
difficult to trace back a contributor through name changes. But again, 
this is something were we should not throw the baby out with the 
bathwater - if we, despite best efforts, overlook someone's changes 
because they have changed their username, and accidentally keep what we 
shouldn't have kept, let them complain and then we'll fix it.)

>     Such auto-detection could be limited to areas where we have recorded
>     contributions that are not being relicensed; in all other areas we
>     would not have to bother.
> 
> Prolific editors don't tend to restrict their activity to a single 
> location.  This might be more widespread than anticipated.

Prolific editors also tend not to leave the project in a huff.

>     Any such mechanism, in my eyes, need not be 100% perfect; it is
>     sufficient to make a honest attempt at doing the right thing, and if
>     a few things slip through, then fix them in case of complaints.
> 
> Anyone who cares strongly enough to not want to relicense their work 
> will probably make a lot of complaints if their work is not fully 
> purged  This could generate a very large amount of manual remediation.

I think we're already doing a *lot* in respecting their wish, planning 
to use a huge amount of manpower to actually purge their data. You know 
that there are voices who say let's just relicense everything and ignore 
these people - we won't do that because we think that if they want their 
data removed, even if it hurts the project, it is prudent to do it. We 
don't even say "remove it yourself" (and you know that there are voices 
who recommend that - simply declare that everyone who doesn't want their 
data relicensed at date X should please remove it now). They just have 
to say it and we will try to remove their contributions as good as we can.

I think it is safe to assume that those who "care strongly" will most 
certainly not be silent, no matter how much diligence is invested on our 
part. If all else fails, they will claim intellectual property on the 
corner pub that has been placed where they drew the roads. For those who 
"care strongly", leaving the project is probably painful, or sad, and in 
many cases they will spend time or even money to make the process 
painful for us as well, if only to prove that they were right in 
forecasting trouble.

This will happen no matter how diligent we are in removing their 
contribution. We might as well not make an effort at all; the amount of 
ire will probably be roughly the same.

> If there is anything under development it would be good if we could see 
> it.  It is unlikely to be a trivial piece of code and I'd be very 
> surprised if it can be developed by September 1st if it hasn't already 
> been started.

I am not aware of a deadline like that. If I were the LWG, I would not 
want to get into discussions about what exactly has to be removed in 
which case before I know how many people agree to relicense their data.

> The whole relicensing effort would be a bit of a non-starter if this 
> deletion process cannot be done.

I'm sure it can be done. I'm also pretty sure it can never be discussed 
to everybody's satisfaction on this mailing list, so I'm all for 
postponing that until we have the acceptance figures.

Bye
Frederik



More information about the legal-talk mailing list