[OSM-talk] Data Quality
Jean-Marc Liotier
jm at liotier.org
Mon Jan 2 12:04:26 UTC 2017
On Fri, 30 Dec 2016 18:30:06 -0500
john whelan <jwhelan0112 at gmail.com> wrote:
>
> In HOT in theory new users work is validated.
> In practise its only when a tile is completed
> and even then most tiles aren't checked.
Thank you... I'm sincerly glad you recognize this issue in HOT
contributions: from my window (mostly Senegal and a bit of Mali) I have
a rather dim view of data quality in changesets with a HOT hashtag. I
don't take that upon novice contributors, in part because I don't want
to kill their enthusiasm and in part because they don't know what they
are doing - so I feel that increased emphasis on data quality would be
a most responsible course of action for HOT.
From what I have witnessed, there seems to be a quantitative emphasis
in reporting about HOT projects (kilometers of roads, number of
buildings). While that makes for impressive presentations, it may be a
misleading metric: the usefulness of data may have more to do with its
quality rather than its quantity. Coming from an enterprise background,
I know the difficulty of weaning oneself from the addictiveness of
spectacular metrics, but I also know how much they hurt the bottom line.
I have made quality assurance and the use of quality assurance tools
such as http://osmose.openstreetmap.fr or the JOSM validator a core part
of my message to budding advanced contributors - but of course such
intimidating tools cannot be pushed to novice contributors.
Nevertheless, there is no reason to restrict quality assurance from
those who need it most, though it might require changes in both tools
and processes.
For example, might the validation status of a Tasking Manager tile be
tied to the number of errors in it ? That would require integration of
something like Osmose to the Tasking Manager, with a short validation
delay unlike the daily batch basis of typical Osmose operation... But
that is the sort of change that would make quality assurance a
first-class citizen of HOT contributions.
Also I wonder if, sometimes, it would be wiser to refrain from
collecting data that will certainly have very low quality and
questionable usefulness - buildings come to mind and I feel that using
polygons with landuse tags would often be more cost-effective than them.
In any case, and not just about HOT, I welcome a debate on how to embed
quality assurance in the contribution process of the most novice
contributors.
More information about the talk
mailing list