[OSM-talk] Report on the OSMF 2021 Survey after One Week

Allan Mustard allan.mustard at osmfoundation.org
Sun Jan 24 21:20:14 UTC 2021


Niels, et al,

The Central Limit Theorem does not predict that answers to a
questionnaire will follow a normal distribution.  Rather,

> when independent random variables are added, their properly normalized
sum tends toward a normal distribution (informally a bell curve) even if
the original variables themselves are not normally distributed. The
theorem is a key concept in probability theory because it implies that
probabilistic and statistical methods that work for normal distributions
can be applicable to many problems involving other types of distributions.

> For example, suppose that a sample is obtained containing many
observations, each observation being randomly generated in a way that
does not depend on the values of the other observations, and that the
arithmetic mean of the observed values is computed. If this procedure is
performed many times, the central limit theorem says that the
probability distribution of the average will closely approximate a
normal distribution.[1]

What this means in practice is that larger samples will typically yield
more accurate estimates of the parameters one would obtain by conducting
a census of the population.  Statisticians consider any sample exceeding
1,067 observations a "large sample".  As of 16:00 hours 24 January, the
survey had collected 1575 full responses (i.e., including demographic
data) and 2127 total responses (i.e., responses lacking some or all
demographic data).  Statisticians will call that a "large sample", and
it continues to grow.

In an ideal world we would a) identify all members of the OSM global
community and b) conduct a census of them.  This is impossible.  We thus
conduct a survey, make it as large as possible, and advertise it widely
so as to reach as many corners of the OSM community as possible.  We
have translated it into 15 languages from the original English.  We are
promoting it through all manner of communications channels.  If we are
missing something, please suggest how to address it. 

The OSM community is clearly not normally distributed when compared to
the global population.  Since activity in the OSMverse requires computer
literacy, the OSM community is more educated and somewhat wealthier than
average (virtually all if not all OSMers have access to a computer and
the internet), and based on Pascal Neis's OSMstat data, it is heavily
biased toward Europe.[2]  Nonetheless, the Central Limit Theorem assures
us that a large sample of the OSM community, if not restricted to a
particular segment of the population, can provide estimates that should
be close to the population's actual statistics, if they could be collected.

[1] https://en.wikipedia.org/wiki/Central_limit_theorem
[2] https://osmstats.neis-one.org/?item=countries


> From: Niels Elgaard Larsen <elgaard at agol.dk>
> To: talk at openstreetmap.org
> Subject: Re: [OSM-talk] Report on the OSMF 2021 Survey after One Week
> Message-ID: <fc1ef304-c8ad-8a46-b3a1-16e3c8613b24 at agol.dk>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Allan Mustard:
>
> > Third, while indeed about one quarter of respondents decline to
> provide the optional
> > demographic data, we are on track to collect enough "full" surveys
> (i.e., including
> > demographic data) to surpass a 3% confidence interval at the 99%
> confidence level.[5]
>
> What makes you believe that the answers follow a normal distribution?
> It will be interesting to see if it looks like a normal distribution.

-- 
-------
/Allan Mustard, Chairperson/
/Board of Directors/
/OpenStreetMap Foundation/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20210124/b3785b35/attachment.htm>


More information about the talk mailing list