[Imports] Fwd: Scientific paper on "Information Seeding"

Tue Jul 3 21:52:57 UTC 2018

I had the chance to visit with Abhishek in Boulder last year. How he came
up with his findings is interesting. He looked at TIGER imports. People
assumed that TIGER is TIGER, but in reality the quality of TIGER in 2006
was very much depend on the county that produced the source material and
how much cleanup Census did to the data. This difference allowed him to
study the impact on community growth. Here is an excerpt from Abhishek's
paper:

"Unbeknownst to the OpenStreetMap contributors, the US Census was itself in
the process of updating and
correcting a mostly outdated and incomplete TIGER map in preparation for
the 2010 census. Consequently,
the 2006 version of the TIGER map that was used by OpenStreetMap contained
accurate and complete
information for only about 60% of the approximately 3,100 counties in the
United States. Information
for the remaining 40%, provided largely out-of-date and incomplete
information. Thus, communities in
about 60% of the counties in the US were seeded with a higher level of
information than the other 40%
during OpenStreetMap’s formative years. The high-information and
low-information groups of counties
were broadly comparable along many other dimensions, such as their
population and income growth. I
exploit this natural experiment to estimate the impact of the level of
information seeding on follow-on
knowledge production in online communities. Specifically, by comparing
Treatment counties (those that
received the higher-quality TIGER map) with Control counties, combined with
micro-data on more than
350 million contributions between 2005 and 2014, I can estimate the causal
effects of information seeding
on long-run outcomes within OpenStreetMap in a difference-in-difference
framework."

This is a little off topic, but I find TIGER interesting. This year I've
been looking at Washington State TIGER data comparing it to county data
where it is available. What I find is that TIGER data in some counties is
pretty much the same as it was in 2006. Yet the counties produce monthly or
more updates to their road networks. They just don't get sent to Census.
One county wasn't updating their friends at Esri so their basemap didn't
match their own data. As you might guess, these counties have a bare number
of people assigned to GIS work.

I'm a strong believer that the right import can help. For instance, having
buildings and addresses help with tools like Maps.Me, GoMap!! and OsmAnd by
helping people place nodes in the correct location. Without buildings for
context, it's just an empty space and hard to visualize. Address help even
more. With an address it just a matter of adding the appropriate POI
information. Roads shouldn't been imported, at least in the US. I can't
speak of for countries.  While I stated above that the counties road data
is much better than Census, it's not perfect. I'd rather see people trace
the roads in than import them. (That's why I created Washington State Roads
background for iD and JOSM.)

The import process seems to work. (Although I still don't understand the
purpose for a seperate id. I hear the reason, but it just doesn't make
sense. Paul Norman has been trying to explain it to me for years. Then
again I'm probably a slow learner) By making sure the data is properly
licensed, there is a good workflow, the data is good, and it has buy in
from the local community, the imports process seems to be working.

I'm including Abhishek on this discussion since I doubt he follows the
import list. If I in anyway misinterpreted his findings he can jump in.

Best,
Clifford

On Tue, Jul 3, 2018 at 12:45 PM Martijn van Exel <m at rtijn.org> wrote:

> Thanks for this follow up. I had not read that paper yet but had seen it
> had come out. I am familiar with Abhishek's other research and will be
> looking forward to sharing my take on it.
> --
>   Martijn van Exel
>   m at rtijn.org
>
> On Tue, Jul 3, 2018, at 11:46, Frederik Ramm wrote:
> > Hi,
> >
> > this (forwarded message belor) is for Martijn who in another thread
> > asked if I knew of any research that would back up by claim that "large
> > imports are often detrimental to community building". I believe the
> > author had also presented at SotM-US last year.
> >
> > Of course in addition to this diligent scientific research, there's also
> > the theoretical models and discussions in
> > http://www.asklater.com/matt/blog/2009/09/06/imports-and-the-community/
> > and the follow-on post, though these are hardly news!
> >
> > I've posted this in a separate thread in order not to further upset
> > Christoph ;)
> >
> > Bye
> > Frederik
> >
> > -------- Forwarded Message --------
> > Subject: Scientific paper on "Information Seeding"
> > Date: Mon, 9 Oct 2017 23:10:13 +0200
> > From: Frederik Ramm <frederik at remote.org>
> > To: Talk Openstreetmap <talk at openstreetmap.org>
> >
> > Hi,
> >
> > today I was pointed to a recent, open-access scientific paper called
> > "Information Seeding and Knowledge Production in Online Communities:
> > Evidence from OpenStreetMap". This open-access paper is available here
> >
> > https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3044581
> >
> > In the context of armchair mapping, but especially of data imports (and
> > recently, machine-generated OSM data) there's always been the discussion
> > between those who say "careful, too much importing will hurt the growth
> > of a local community", and others who say "this import is going to
> > kick-start a local community, let's do it!"
> >
> > Until now this has been a rather un-proven matter of belief, and the
> > general mood is usually in favour of a quick build-up of data (through
> > remote mapping, importing, or machine learning) instead of a
> > take-it-slow approach that would wait for a community to form and take
> > matters into their own hands.
> >
> > The paper quoted above uses OSM as a research object and finds that in
> > certain ways imports in OSM have indeed harmed community growth. The
> > paper attempts to provide insights helpful for all kinds of
> > user-generated knowledge projects (not necessarily OSM), and
> > draws the following conclusion:
> >
> > "While information seeding could be useful to encourage the production
> > of distant forms of follow-on knowledge, it might demotivate and
> > under-provide more mundane and incremental follow-on information.
> > Accordingly, if managers are interested in leveraging pre-existing
> > information to spur the development of online communities, they might be
> > better served by withholding some pre-existing information and provide
> > community members with some space to create knowledge from scratch—even
> > if such knowledge already exists in an external source. This policy
> allows
> > community members to become invested in the community and develop
> > ownership over the knowledge."
> >
> > Bye
> > Frederik
> >
> > --
> > Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00'09" E008°23'33"
> >
> > _______________________________________________
> > Imports mailing list
> > Imports at openstreetmap.org
> > https://lists.openstreetmap.org/listinfo/imports
>
> _______________________________________________
> Imports mailing list
> Imports at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/imports
>

-- 
@osm_seattle
osm_seattle.snowandsnow.us
OpenStreetMap: Maps with a human touch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports/attachments/20180703/fe907283/attachment-0001.html>