[OSM-talk] Report on the OSMF 2021 Survey after One Week

Joseph Eisenberg joseph.eisenberg at gmail.com
Sun Jan 24 21:53:56 UTC 2021


> Re: "In an ideal world we would a) identify all members of the OSM global
community and b) conduct a census of them.  This is impossible.  We thus
conduct a survey, make it as large as possible, and advertise it widely so
as to reach as many corners of the OSM community as possible."

The better option would be to:
1) identify the community
2) create a randomized statistical sample of this larger community
3) survey just this randomly-selected subset.

With that technique you could just do 1000 surveys and get a much less
biased survey response than with the current method of soliciting voluntary
survey submissions, though the cost would be greater, since OSMF would have
to actively reach out to the individuals.

By asking people to volunteer for the survey, you will get a strong
self-selection bias which will not improve by increasing the number of
surveys: https://en.wikipedia.org/wiki/Self-selection_bias and
https://en.wikipedia.org/wiki/Participation_bias

The biggest problem with this survey is that it is not clear what
population it is intended to represent. Who would be included in "all
members of the OSM global community"? Is it, active mappers, or mappers +
direct database users?

Or does it include everyone who uses maps based on OpenStreetMap data? If
the latter, this would include basically all facebook users and users of
many apps / websites which use Mapbox or other services.

Without clearly defining this beforehand, it's not really possible to know
how useful this survey will be.

(Note that these issues with sampling and selection and participation bias
do not consider the bigger issue of different cultural and linguistic
interpretations of the questions, and the big issue of possible bias in
what questions are asked, but I believe other people have already mentioned
these problems previously: e.g. https://en.wikipedia.org/wiki/Response_bias
and https://en.wikipedia.org/wiki/Total_survey_error etc.)

-- Joseph Eisenberg

On Sun, Jan 24, 2021 at 1:26 PM Allan Mustard <
allan.mustard at osmfoundation.org> wrote:

> Niels, et al,
>
> The Central Limit Theorem does not predict that answers to a questionnaire
> will follow a normal distribution.  Rather,
>
> > when independent random variables are added, their properly normalized
> sum tends toward a normal distribution (informally a bell curve) even if
> the original variables themselves are not normally distributed. The theorem
> is a key concept in probability theory because it implies that
> probabilistic and statistical methods that work for normal distributions
> can be applicable to many problems involving other types of distributions.
>
> > For example, suppose that a sample is obtained containing many
> observations, each observation being randomly generated in a way that does
> not depend on the values of the other observations, and that the arithmetic
> mean of the observed values is computed. If this procedure is performed
> many times, the central limit theorem says that the probability
> distribution of the average will closely approximate a normal
> distribution.[1]
>
> What this means in practice is that larger samples will typically yield
> more accurate estimates of the parameters one would obtain by conducting a
> census of the population.  Statisticians consider any sample exceeding
> 1,067 observations a "large sample".  As of 16:00 hours 24 January, the
> survey had collected 1575 full responses (i.e., including demographic data)
> and 2127 total responses (i.e., responses lacking some or all demographic
> data).  Statisticians will call that a "large sample", and it continues to
> grow.
>
> In an ideal world we would a) identify all members of the OSM global
> community and b) conduct a census of them.  This is impossible.  We thus
> conduct a survey, make it as large as possible, and advertise it widely so
> as to reach as many corners of the OSM community as possible.  We have
> translated it into 15 languages from the original English.  We are
> promoting it through all manner of communications channels.  If we are
> missing something, please suggest how to address it.
>
> The OSM community is clearly not normally distributed when compared to the
> global population.  Since activity in the OSMverse requires computer
> literacy, the OSM community is more educated and somewhat wealthier than
> average (virtually all if not all OSMers have access to a computer and the
> internet), and based on Pascal Neis's OSMstat data, it is heavily biased
> toward Europe.[2]  Nonetheless, the Central Limit Theorem assures us that a
> large sample of the OSM community, if not restricted to a particular
> segment of the population, can provide estimates that should be close to
> the population's actual statistics, if they could be collected.
>
> [1] https://en.wikipedia.org/wiki/Central_limit_theorem
> [2] https://osmstats.neis-one.org/?item=countries
>
>
> From: Niels Elgaard Larsen <elgaard at agol.dk> <elgaard at agol.dk>
> To: talk at openstreetmap.org
> Subject: Re: [OSM-talk] Report on the OSMF 2021 Survey after One Week
> Message-ID: <fc1ef304-c8ad-8a46-b3a1-16e3c8613b24 at agol.dk>
> <fc1ef304-c8ad-8a46-b3a1-16e3c8613b24 at agol.dk>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Allan Mustard:
>
> > Third, while indeed about one quarter of respondents decline to provide
> the optional
> > demographic data, we are on track to collect enough "full" surveys
> (i.e., including
> > demographic data) to surpass a 3% confidence interval at the 99%
> confidence level.[5]
>
> What makes you believe that the answers follow a normal distribution?
> It will be interesting to see if it looks like a normal distribution.
>
>
> --
> -------
> *Allan Mustard, Chairperson*
> *Board of Directors*
> *OpenStreetMap Foundation*
> _______________________________________________
> talk mailing list
> talk at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/talk
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20210124/96255608/attachment-0001.htm>


More information about the talk mailing list