[OSM-talk] Report on the OSMF 2021 Survey after One Week

Allan Mustard allan.mustard at osmfoundation.org
Sun Jan 24 14:11:42 UTC 2021


> What are assumptions for that? Because given inherently biased samples (biased in way
> not allowing for balancing/adjusting - survey is made because demographics are unknown!),
> I am pretty sure that it is misleading to include such statistics.
>
> We do not know how strong bias, even type of it and we can only guess how sample
> of survey-takers is biased compared to general community (and there is tradictional
> problem of people trying to manipulate survey by taking it multiple times,
> strategic refusal to answer some questions about demographics or just lying).
>
> For example if some group is overrepresented or underrepresented compared to
> demographic of OSM mappers this survey is unable to detect it!  
>
> In real situation even with 5 000 people answering it is likely that 
> real confidence interval may be 20% or 10% of 1% - and we do not know
> which one would be correct!
Mateusz, et al,

Our first assumption is that disseminating information through a
multitude of communications channels will reach a much less biased
sample than if we were only to announce the survey via the talk lists. 
This is why the Board is spreading the word about the survey via social
media plus direct emailing to user groups, to local chapters and
communities, to working groups, via WeeklyOSM, the website, and the wiki
calendar, and is asking all who read these announcements to propagate
them farther via any and all channels at their disposal.  It is why the
Board has had the survey translated from original English into 15 other
languages (so far), the largest effort to translate an OSMF survey to
date.[1]  It is also the rationale behind running the survey for nearly
one month, which is an unusually long duration for a survey.  Local
communities have advised us that a month is needed to disseminate
information of this nature, particularly in regions with difficult
Internet access.  At present we are averaging over 250 responses per day
to the survey.

Second, when the data are in, we plan to attempt to normalize the survey
data with known quantifiable data on mapping activity, such as data from
OSMstat.[2]  This will help address geographic biases in the sample. 
The major reason we have asked for demographic data is to assess bias in
the data against known characteristics of the community as can be
derived from OSMstats, but we suspect that the demographic data will
also be, in and of themselves, of interest to much of the community.  I
point out that there are some academic studies of the structure of the
OSM community, and we will examine them as well with an eye to
normalizing any selection biases we can.[3][4]

Third, while indeed about one quarter of respondents decline to provide
the optional demographic data, we are on track to collect enough "full"
surveys (i.e., including demographic data) to surpass a 3% confidence
interval at the 99% confidence level.[5]

Fourth, as for taking the survey multiple times, this is indeed a
possibility if individuals have multiple email addresses.  However, we
choose to assume, in line with the Etiquette policy, "good faith" on the
part of members of the OSM community.[6]  By and large the Board
believes most members of the community who participate in the survey
will do so in good faith.  That said, the reason respondents are
required to register with an email address is exactly that--to
discourage "gaming" the survey by responding multiple times.

Fifth, the Board is aware that there will be some inherent selection
biases.  The survey is not distributed on paper; hence, it is
selectively biased toward individuals with access to a computer and the
internet.  We do not consider that a serious bias because those tools
are required for contributing to or using OpenStreetMap.  It does not
cover all languages of all in the community, so is selectively biased
toward speakers of foreign languages (one of the 16 in the survey, not
necessarily English).  It is selectively biased toward those who follow
social media, are plugged into local chapters and communities, read
WeeklyOSM, read the wiki landing page, or map (since the banner went up
on openstreetmap.org). The survey is selectively biased against those
who strictly map or use the data and are disinterested in OSMF policy
matters, since they are less likely to take time to respond to the
survey.  If you can identify other selection biases inherent in the
survey, please share them with the Board.

cheers,
apm

[1] https://wiki.openstreetmap.org/wiki/Foundation/Surveys  The only
previous survey to have been translated from English was the August 2019
survey for the local chapter congress, which was translated into 11
languages, of which 8 were European languages and 2 were Chinese in
variant scripts; the other language was Persian.  The current survey is
available in 7 Asian languages (Chinese, Indonesian, Japanese, Korean,
Persian, Turkish, Vietnamese) as well as 9 European languages.
[2] https://osmstats.neis-one.org/
[3] https://link.springer.com/article/10.1007/s10708-019-10035-z
[4]
https://2018.stateofthemap.org/2018/A09-Surveying_OSM_contributors__Learning_from_the_community/
[5] http://www.stat.yale.edu/Courses/1997-98/101/confint.htm
[6] https://wiki.openstreetmap.org/wiki/Etiquette

-------

/Allan Mustard, Chairperson/
/Board of Directors/
/OpenStreetMap Foundation/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20210124/5bf44ead/attachment-0001.htm>


More information about the talk mailing list