<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">> Re: "<span style="font-family:Helvetica,Arial,sans-serif">In an ideal world we would a) identify all members of the OSM global community and b) conduct a census of them. This is impossible. We thus conduct a survey, make it as large as possible, and advertise it widely so as to reach as many corners of the OSM community as possible."</span></div><div dir="ltr"><span style="font-family:Helvetica,Arial,sans-serif"><br></span></div><div><span style="font-family:Helvetica,Arial,sans-serif">The better option would be to:</span></div><div><span style="font-family:Helvetica,Arial,sans-serif">1) identify the community</span></div><div><span style="font-family:Helvetica,Arial,sans-serif">2) create a randomized statistical sample of this larger community </span></div><div><span style="font-family:Helvetica,Arial,sans-serif">3) survey just this randomly-selected subset.</span></div><div><span style="font-family:Helvetica,Arial,sans-serif"><br></span></div><div><font face="Helvetica, Arial, sans-serif">With that technique you could just do 1000 surveys and get a much less biased survey response than with the current method of soliciting voluntary survey submissions, though the cost would be greater, since OSMF would have to actively reach out to the individuals.</font></div><div><font face="Helvetica, Arial, sans-serif"><br></font></div><div><font face="Helvetica, Arial, sans-serif">By asking people to volunteer for the survey, you will get a strong self-selection bias which will not improve by increasing the number of surveys: </font><a href="https://en.wikipedia.org/wiki/Self-selection_bias">https://en.wikipedia.org/wiki/Self-selection_bias</a> and <a href="https://en.wikipedia.org/wiki/Participation_bias">https://en.wikipedia.org/wiki/Participation_bias</a></div><div><br></div><div>The biggest problem with this survey is that it is not clear what population it is intended to represent. Who would be included in "all members of the OSM global community"? Is it, active mappers, or mappers + direct database users? </div><div><br></div><div>Or does it include everyone who uses maps based on OpenStreetMap data? If the latter, this would include basically all facebook users and users of many apps / websites which use Mapbox or other services.</div><div><br></div><div>Without clearly defining this beforehand, it's not really possible to know how useful this survey will be.</div><div><br></div><div>(Note that these issues with sampling and selection and participation bias do not consider the bigger issue of different cultural and linguistic interpretations of the questions, and the big issue of possible bias in what questions are asked, but I believe other people have already mentioned these problems previously: e.g. <a href="https://en.wikipedia.org/wiki/Response_bias">https://en.wikipedia.org/wiki/Response_bias</a> and <a href="https://en.wikipedia.org/wiki/Total_survey_error">https://en.wikipedia.org/wiki/Total_survey_error</a> etc.)</div><div><br></div><div>-- Joseph Eisenberg</div></div></div></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Jan 24, 2021 at 1:26 PM Allan Mustard <<a href="mailto:allan.mustard@osmfoundation.org">allan.mustard@osmfoundation.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div>
<p><font face="Helvetica, Arial, sans-serif">Niels, et al,</font></p>
<p><font face="Helvetica, Arial, sans-serif">The Central Limit
Theorem does not predict that answers to a questionnaire will
follow a normal distribution. Rather,<br>
</font></p>
<p><font face="Helvetica, Arial, sans-serif">> when independent
random variables are added, their properly normalized sum tends
toward a normal distribution (informally a bell curve) even if
the original variables themselves are not normally distributed.
The theorem is a key concept in probability theory because it
implies that probabilistic and statistical methods that work for
normal distributions can be applicable to many problems
involving other types of distributions.</font></p>
<p><font face="Helvetica, Arial, sans-serif">> For example,
suppose that a sample is obtained containing many observations,
each observation being randomly generated in a way that does not
depend on the values of the other observations, and that the
arithmetic mean of the observed values is computed. If this
procedure is performed many times, the central limit theorem
says that the probability distribution of the average will
closely approximate a normal distribution.[1]</font></p>
<p><font face="Helvetica, Arial, sans-serif">What this means in
practice is that larger samples will typically yield more
accurate estimates of the parameters one would obtain by
conducting a census of the population. Statisticians consider
any sample exceeding 1,067 observations a "large sample". As of
16:00 hours 24 January, the survey had collected 1575 full
responses (i.e., including demographic data) and 2127 total
responses (i.e., responses lacking some or all demographic
data). Statisticians will call that a "large sample", and it
continues to grow.<br>
</font></p>
<p><font face="Helvetica, Arial, sans-serif">In an ideal world we
would a) identify all members of the OSM global community and b)
conduct a census of them. This is impossible. We thus conduct
a survey, make it as large as possible, and advertise it widely
so as to reach as many corners of the OSM community as
possible. We have translated it into 15 languages from the
original English. We are promoting it through all manner of
communications channels. If we are missing something, please
suggest how to address it. <br>
</font></p>
<p><font face="Helvetica, Arial, sans-serif">The OSM community is
clearly not normally distributed when compared to the global
population. Since activity in the OSMverse requires computer
literacy, the OSM community is more educated and somewhat
wealthier than average (virtually all if not all OSMers have
access to a computer and the internet), and based on Pascal
Neis's OSMstat data, it is heavily biased toward Europe.[2]
Nonetheless, the Central Limit Theorem assures us that a large
sample of the OSM community, if not restricted to a particular
segment of the population, can provide estimates that should be
close to the population's actual statistics, if they could be
collected.<br>
</font></p>
<p><font face="Helvetica, Arial, sans-serif">[1]
<a href="https://en.wikipedia.org/wiki/Central_limit_theorem" target="_blank">https://en.wikipedia.org/wiki/Central_limit_theorem</a><br>
[2] <a href="https://osmstats.neis-one.org/?item=countries" target="_blank">https://osmstats.neis-one.org/?item=countries</a><br>
</font></p>
<p><font face="Helvetica, Arial, sans-serif"><br>
</font></p>
<p><font face="Helvetica, Arial, sans-serif"></font>
</p><blockquote type="cite">From: Niels Elgaard Larsen
<a href="mailto:elgaard@agol.dk" target="_blank"><elgaard@agol.dk></a><br>
To: <a href="mailto:talk@openstreetmap.org" target="_blank">talk@openstreetmap.org</a><br>
Subject: Re: [OSM-talk] Report on the OSMF 2021 Survey after One
Week<br>
Message-ID: <a href="mailto:fc1ef304-c8ad-8a46-b3a1-16e3c8613b24@agol.dk" target="_blank"><fc1ef304-c8ad-8a46-b3a1-16e3c8613b24@agol.dk></a><br>
Content-Type: text/plain; charset=utf-8; format=flowed<br>
<br>
Allan Mustard:<br>
<br>
> Third, while indeed about one quarter of respondents
decline to provide the optional <br>
> demographic data, we are on track to collect enough "full"
surveys (i.e., including <br>
> demographic data) to surpass a 3% confidence interval at
the 99% confidence level.[5]<br>
<br>
What makes you believe that the answers follow a normal
distribution?<br>
It will be interesting to see if it looks like a normal
distribution.</blockquote>
<br>
<p></p>
<div>-- <br>
-------<br>
<i>Allan Mustard, Chairperson</i><br>
<i>Board of Directors</i><br>
<i>OpenStreetMap Foundation</i></div>
</div>
_______________________________________________<br>
talk mailing list<br>
<a href="mailto:talk@openstreetmap.org" target="_blank">talk@openstreetmap.org</a><br>
<a href="https://lists.openstreetmap.org/listinfo/talk" rel="noreferrer" target="_blank">https://lists.openstreetmap.org/listinfo/talk</a><br>
</blockquote></div>