[Imports] Importing Kerala, India road network from Facebook's ML generated data

Frederik Ramm frederik at remote.org
Wed Aug 29 15:42:15 UTC 2018


Hi,

I would like to use this an opportunity to offer a few general remarks.

It is clear that OSM is no place to import "machine learning" results
without thorough checks by people familiar with the situation on the
ground. I think that, at least for the moment, everyone agrees with that
statement.

What we're seeing, though, is the "white-washing" (for lack of a more
culturally neutral term... could I say the "community-washing"?) of
machine learning or other low quality data, in various ways:

* people suggest importing low-quality machine-learning data and promise
a manual review and improvements during the import process, but the
reality later is that a focus was put on quantity not quality, and
individuals "review" tens of thousands of objects a day - this is not
what most people would understand to be a review.

* people create derived works from the machine-learning data, e.g.
aggregate low quality building traces into "residential areas" and then
import them, again with quality control on the level of spot checks

This often happens because there is no good collaboration platform for
fixing errors before the data goes into OSM; the hope is that *after*
the data has gone into OSM, "the community" - here used as a nebulous
term that often means "other people not us" - can and will fix things.

This is a trend that we should be wary about. Looking at the GitHub
issue, I do find a few "hopeful" statements there that would raise an
alarm for me: "Floating roads - easy to fix" (yes but who does it?), "we
usually take care of all those mentioned fixes in our editing process"
(from Drishtie - I am unsure what "usually" means and how much time
Facebook have agreed to spend on this?), "Yes we can go. May be some
problems will be fixed later", and so on.

As always, the concrete discussion is made difficult by an existing
disaster condition, where anyone who says "wait a minute" feels the
pressure of standing in the way of humanitarians saving lives. My
respect to Christoph for taking a principled stand here and separating
the aspects of "due process" and "humanitarian situation".

I'm loathe to oppose the concrete project, but I think we really have to
be more strict here if we do not want to become a rubbish dump in the
long run. We can't call for Facebook's machine-learning output to be
imported everytime there's a natural disaster somewhere.

Every import should have a post-import review, where after 6 months or
one year, we actually analyze what has happened:

* how much has been imported (compared to what was planned),
* have the quality checks/controls that were promised/hoped for during
the planning actually materialised, or has problematic data been waved
through with just spot checks?
* has the data been healthily assimilated by a local community working
on OSM, or does it just sit there and rot away?

Such an "import health check" could then lead to concrete projects to
improve it if deemed problematic, or in drastic cases, a decision could
be made to remove an import again if it is found out that it was not
helpful.

Actually, this does not only apply to imports but also to concerted
mapping efforts - quote from a post that Pierre Beland made on osm-talk
just yesterday: "The number of contributors is limited in Africa and the
risk is that errors created by mapathons while participating to Crisis
responses stay as is for years."

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00'09" E008°23'33"



More information about the Imports mailing list