<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">> Re: "<span style="font-family:Helvetica,Arial,sans-serif">In an ideal world we would a) identify all members of the OSM global community and b) conduct a census of them.  This is impossible.  We thus conduct a survey, make it as large as possible, and advertise it widely so as to reach as many corners of the OSM community as possible."</span></div><div dir="ltr"><span style="font-family:Helvetica,Arial,sans-serif"><br></span></div><div><span style="font-family:Helvetica,Arial,sans-serif">The better option would be to:</span></div><div><span style="font-family:Helvetica,Arial,sans-serif">1) identify the community</span></div><div><span style="font-family:Helvetica,Arial,sans-serif">2) create a randomized statistical sample of this larger community </span></div><div><span style="font-family:Helvetica,Arial,sans-serif">3) survey just this randomly-selected subset.</span></div><div><span style="font-family:Helvetica,Arial,sans-serif"><br></span></div><div><font face="Helvetica, Arial, sans-serif">With that technique you could just do 1000 surveys and get a much less biased survey response than with the current method of soliciting voluntary survey submissions, though the cost would be greater, since OSMF would have to actively reach out to the individuals.</font></div><div><font face="Helvetica, Arial, sans-serif"><br></font></div><div><font face="Helvetica, Arial, sans-serif">By asking people to volunteer for the survey, you will get a strong self-selection bias which will not improve by increasing the number of surveys: </font><a href="https://en.wikipedia.org/wiki/Self-selection_bias">https://en.wikipedia.org/wiki/Self-selection_bias</a> and <a href="https://en.wikipedia.org/wiki/Participation_bias">https://en.wikipedia.org/wiki/Participation_bias</a></div><div><br></div><div>The biggest problem with this survey is that it is not clear what population it is intended to represent. Who would be included in "all members of the OSM global community"? Is it, active mappers, or mappers + direct database users? </div><div><br></div><div>Or does it include everyone who uses maps based on OpenStreetMap data? If the latter, this would include basically all facebook users and users of many apps / websites which use Mapbox or other services.</div><div><br></div><div>Without clearly defining this beforehand, it's not really possible to know how useful this survey will be.</div><div><br></div><div>(Note that these issues with sampling and selection and participation bias do not consider the bigger issue of different cultural and linguistic interpretations of the questions, and the big issue of possible bias in what questions are asked, but I believe other people have already mentioned these problems previously: e.g. <a href="https://en.wikipedia.org/wiki/Response_bias">https://en.wikipedia.org/wiki/Response_bias</a> and <a href="https://en.wikipedia.org/wiki/Total_survey_error">https://en.wikipedia.org/wiki/Total_survey_error</a> etc.)</div><div><br></div><div>-- Joseph Eisenberg</div></div></div></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Jan 24, 2021 at 1:26 PM Allan Mustard <<a href="mailto:allan.mustard@osmfoundation.org">allan.mustard@osmfoundation.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

  <div>

    <p><font face="Helvetica, Arial, sans-serif">Niels, et al,</font></p>

    <p><font face="Helvetica, Arial, sans-serif">The Central Limit

        Theorem does not predict that answers to a questionnaire will

        follow a normal distribution.  Rather,<br>

      </font></p>

    <p><font face="Helvetica, Arial, sans-serif">> when independent

        random variables are added, their properly normalized sum tends

        toward a normal distribution (informally a bell curve) even if

        the original variables themselves are not normally distributed.

        The theorem is a key concept in probability theory because it

        implies that probabilistic and statistical methods that work for

        normal distributions can be applicable to many problems

        involving other types of distributions.</font></p>

    <p><font face="Helvetica, Arial, sans-serif">> For example,

        suppose that a sample is obtained containing many observations,

        each observation being randomly generated in a way that does not

        depend on the values of the other observations, and that the

        arithmetic mean of the observed values is computed. If this

        procedure is performed many times, the central limit theorem

        says that the probability distribution of the average will

        closely approximate a normal distribution.[1]</font></p>

    <p><font face="Helvetica, Arial, sans-serif">What this means in

        practice is that larger samples will typically yield more

        accurate estimates of the parameters one would obtain by

        conducting a census of the population.  Statisticians consider

        any sample exceeding 1,067 observations a "large sample".  As of

        16:00 hours 24 January, the survey had collected 1575 full

        responses (i.e., including demographic data) and 2127 total

        responses (i.e., responses lacking some or all demographic

        data).  Statisticians will call that a "large sample", and it

        continues to grow.<br>

      </font></p>

    <p><font face="Helvetica, Arial, sans-serif">In an ideal world we

        would a) identify all members of the OSM global community and b)

        conduct a census of them.  This is impossible.  We thus conduct

        a survey, make it as large as possible, and advertise it widely

        so as to reach as many corners of the OSM community as

        possible.  We have translated it into 15 languages from the

        original English.  We are promoting it through all manner of

        communications channels.  If we are missing something, please

        suggest how to address it.  <br>

      </font></p>

    <p><font face="Helvetica, Arial, sans-serif">The OSM community is

        clearly not normally distributed when compared to the global

        population.  Since activity in the OSMverse requires computer

        literacy, the OSM community is more educated and somewhat

        wealthier than average (virtually all if not all OSMers have

        access to a computer and the internet), and based on Pascal

        Neis's OSMstat data, it is heavily biased toward Europe.[2] 

        Nonetheless, the Central Limit Theorem assures us that a large

        sample of the OSM community, if not restricted to a particular

        segment of the population, can provide estimates that should be

        close to the population's actual statistics, if they could be

        collected.<br>

      </font></p>

    <p><font face="Helvetica, Arial, sans-serif">[1]

        <a href="https://en.wikipedia.org/wiki/Central_limit_theorem" target="_blank">https://en.wikipedia.org/wiki/Central_limit_theorem</a><br>

        [2] <a href="https://osmstats.neis-one.org/?item=countries" target="_blank">https://osmstats.neis-one.org/?item=countries</a><br>

      </font></p>

    <p><font face="Helvetica, Arial, sans-serif"><br>

      </font></p>

    <p><font face="Helvetica, Arial, sans-serif"></font>

      </p><blockquote type="cite">From: Niels Elgaard Larsen

        <a href="mailto:elgaard@agol.dk" target="_blank"><elgaard@agol.dk></a><br>

        To: <a href="mailto:talk@openstreetmap.org" target="_blank">talk@openstreetmap.org</a><br>

        Subject: Re: [OSM-talk] Report on the OSMF 2021 Survey after One

        Week<br>

        Message-ID: <a href="mailto:fc1ef304-c8ad-8a46-b3a1-16e3c8613b24@agol.dk" target="_blank"><fc1ef304-c8ad-8a46-b3a1-16e3c8613b24@agol.dk></a><br>

        Content-Type: text/plain; charset=utf-8; format=flowed<br>

        <br>

        Allan Mustard:<br>

        <br>

        > Third, while indeed about one quarter of respondents

        decline to provide the optional <br>

        > demographic data, we are on track to collect enough "full"

        surveys (i.e., including <br>

        > demographic data) to surpass a 3% confidence interval at

        the 99% confidence level.[5]<br>

        <br>

        What makes you believe that the answers follow a normal

        distribution?<br>

        It will be interesting to see if it looks like a normal

        distribution.</blockquote>

      <br>

    <p></p>

    <div>-- <br>

      -------<br>

      <i>Allan Mustard, Chairperson</i><br>

      <i>Board of Directors</i><br>

      <i>OpenStreetMap Foundation</i></div>

  </div>

_______________________________________________<br>

talk mailing list<br>

<a href="mailto:talk@openstreetmap.org" target="_blank">talk@openstreetmap.org</a><br>

<a href="https://lists.openstreetmap.org/listinfo/talk" rel="noreferrer" target="_blank">https://lists.openstreetmap.org/listinfo/talk</a><br>

</blockquote></div>