[Talk-GB] an estimate of data loss under relicensing
TimSC
mapping at sheerman-chase.org.uk
Thu Jul 22 10:34:24 BST 2010
Hi all,
To try to get a feeling for the potential consequences of relicensing, I
have been doing analysis of edits in the UK and how contributors have
voted on the doodle poll. I feel that we should look before we leap,
regarding the possible impact of people who refuse to relicense. I
wondered how many nodes, ways and relations would be transitioned in
relicensing. I used the crude assumption that each object has only one
editor, which would underestimate the impact of refuser contributions. I
requested the biggest contributors to vote on the doodle poll to improve
the turn out. Although I only have votes for 1% of individual UK
contributors, doodle now has a 24% turn out when weighted by mapping
contribution size. A few mappers account for a large proportion of UK
data. Previously, I did not notice how many mappers had just done a few
small changes: the median number of nodes contributed is only 10! I also
have not considered the response rate once OSMF pitch the question to
contributors, and what happens if the OS data cannot be relicensed.
I want to next give my excuses for not publishing the raw statistics.
Even with 24% turn out (by contribution size), the are a few
non-committal large contributors (e.g. me and a few others). Unless the
turn out rate is higher, the stats can be twisted depending on the mood
I am in. But there is a pattern emerging. The overall UK picture seems
to be fairly bright for minimal data loss. Every big contributor I
contact votes "yes" to relicencing (with or without reservations). I
estimate an overall data loss of 5% to 17% for the UK (ignoring the
effect of objects with multiple editors).
The main exception to this is a small cluster of refusers around London.
(I am not just talking about myself here.) The worst case scenario is
50% data loss in the Greater London area but, really, I don't know how
it would play out. Because of the density of mapping, there is more
likely to be multiple editors in this area too. Basically, it's a wild
card. But I would be surprised if there are big problems outside the
London/SE area. Unless of course 5% is a big problem - I am not too sure
how much work it would take to patch up omissions, even assuming a
relatively smooth transition.
Anyway, I never was much good at statistics! I just wanted to circulate
something, after many contributors were kind enough to honour my request
and vote on doodle.
TimSC
More information about the Talk-GB
mailing list