[Talk-us] OSM Data Quality

Clifford Snow clifford at snowandsnow.us
Sat Jun 1 02:56:51 UTC 2013


On Fri, May 31, 2013 at 3:02 PM, stevea <steveaOSM at softworkers.com> wrote:

> Clifford Snow writes:
>
> First you need to define what good data quality is and second, you need to
> collect data to measure data quality. Once good data is collect then start
> determining root cause of the problem.
>
>
> Most of what I see is anecdotal evidence of problems. Fixing the cause of
> those problems is good, but it may not get at the underlying issues.
>
>
> I say +1 to this, but it is nebulous as to be only broadly helpful.
> Clifford, care to flesh that out a bit?
>

You mean you could sense what I was trying to say?  Needless to say, I tend
to be a bit terse with my emails. So let me try a slightly longer version.

We need quality standards that can be measured.  We can and should have
standards for mapping objects and ways. With those standards a quality
control sampling process could be initiated to test the quality of new
edits as well as the existing data. With a sample of data we could build
a histogram of errors. Ideally tackling the largest column. Even a small
sample size can work. Statistical Process Control in a manufacturing
process only samples some 20 items. This isn't a manufacturing process, but
the principles are the same.

Unfortunately, some of what we do is subjective. Take the recent issue of
tagging Subway sandwich shops that was recently discussed on one of the
mailing lists. Everyone had a valid solution. Maybe some were more valid
that others, but anyone of them was workable. Yet tagging POI is an
important step to get right.

Adding a node to say this is a bus stop, when it isn't is very clearly a
data quality issue. It can be measured. The path of a highway can be
determined to track gps traces or Bing images. It can be measured. However,
is it accurately tagged as a primary, secondary, tertiary, etc. is somewhat
subjective.

Tackling the subjective is more difficult. For example, the Subway sandwich
shop. If we had hard and fast rules it that every Subway be tagged as
amenity=fastfood then we could easily do a quality check. But OSM give
people a lot of tagging freedom.

One last thing. My sense is that the problem generally isn't the mappers.
Yes I screwed up more than my fair share of edits. But most problems are
system problems. To fix those we need good data and a willingness to get at
the root cause of the problem.

Short summary: sample edits, categorize errors, determine root cause, then
fix root cause. That process will drastically improve the quality of OSM.
Hopefully someone with more recent background in Quality Control can step
in here to help me out.

-- 
Clifford

OpenStreetMap: Maps with a human touch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-us/attachments/20130531/987d2402/attachment.html>


More information about the Talk-us mailing list