[Talk-GB] an estimate of data loss under relicensing

TimSC mapping at sheerman-chase.org.uk
Thu Jul 22 10:34:24 BST 2010


Hi all,

To try to get a feeling for the potential consequences of relicensing, I 
have been doing analysis of edits in the UK and how contributors have 
voted on the doodle poll. I feel that we should look before we leap, 
regarding the possible impact of people who refuse to relicense. I 
wondered how many nodes, ways and relations would be transitioned in 
relicensing. I used the crude assumption that each object has only one 
editor, which would underestimate the impact of refuser contributions. I 
requested the biggest contributors to vote on the doodle poll to improve 
the turn out. Although I only have votes for 1% of individual UK 
contributors, doodle now has a 24% turn out when weighted by mapping 
contribution size. A few mappers account for a large proportion of UK 
data. Previously, I did not notice how many mappers had just done a few 
small changes: the median number of nodes contributed is only 10! I also 
have not considered the response rate once OSMF pitch the question to 
contributors, and what happens if the OS data cannot be relicensed.

I want to next give my excuses for not publishing the raw statistics. 
Even with 24% turn out (by contribution size), the are a few 
non-committal large contributors (e.g. me and a few others). Unless the 
turn out rate is higher, the stats can be twisted depending on the mood 
I am in. But there is a pattern emerging. The overall UK picture seems 
to be fairly bright for minimal data loss. Every big contributor I 
contact votes "yes" to relicencing (with or without reservations). I 
estimate an overall data loss of 5% to 17% for the UK (ignoring the 
effect of objects with multiple editors).

The main exception to this is a small cluster of refusers around London. 
(I am not just talking about myself here.) The worst case scenario is 
50% data loss in the Greater London area but, really, I don't know how 
it would play out. Because of the density of mapping, there is more 
likely to be multiple editors in this area too. Basically, it's a wild 
card. But I would be surprised if there are big problems outside the 
London/SE area. Unless of course 5% is a big problem - I am not too sure 
how much work it would take to patch up omissions, even assuming a 
relatively smooth transition.

Anyway, I never was much good at statistics! I just wanted to circulate 
something, after many contributors were kind enough to honour my request 
and vote on doodle.

TimSC





More information about the Talk-GB mailing list