<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Fri, May 31, 2013 at 3:02 PM, stevea <span dir="ltr"><<a href="mailto:steveaOSM@softworkers.com" target="_blank">steveaOSM@softworkers.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="im"><div>Clifford Snow writes:</div>

<blockquote type="cite">First you need to define what good data

quality is and second, you need to collect data to measure data

quality. Once good data is collect then start determining root cause

of the problem.</blockquote>

<blockquote type="cite"><br></blockquote>

<blockquote type="cite">Most of what I see is anecdotal evidence

of problems. Fixing the cause of those problems is good, but it may

not get at the underlying issues.</blockquote>

<div><br></div>

</div><div>I say +1 to this, but it is nebulous as to be only broadly

helpful.  Clifford, care to flesh that out a bit?</div>

<div></div></blockquote></div><br>You mean you could sense what I was trying to say?  Needless to say, I tend to be a bit terse with my emails. So let me try a slightly longer version.</div><div class="gmail_extra"><br></div>


<div class="gmail_extra" style>We need quality standards that can be measured.  We can and should have standards for mapping objects and ways. With those standards a quality control sampling process could be initiated to test the quality of new edits as well as the existing data. With a sample of data we could build a histogram of errors. Ideally tackling the largest column. Even a small sample size can work. Statistical Process Control in a manufacturing process only samples some 20 items. This isn't a manufacturing process, but the principles are the same. </div>


<div class="gmail_extra"><br></div><div class="gmail_extra">Unfortunately, some of what we do is subjective. Take the recent issue of tagging Subway sandwich shops that was recently discussed on one of the mailing lists. Everyone had a valid solution. Maybe some were more valid that others, but anyone of them was workable. Yet tagging POI is an important step to get right. </div>


<div class="gmail_extra"><br></div><div class="gmail_extra">Adding a node to say this is a bus stop, when it isn't is very clearly a data quality issue. It can be measured. The path of a highway can be determined to track gps traces or Bing images. It can be measured. However, is it accurately tagged as a primary, secondary, tertiary, etc. is somewhat subjective. </div>


<div class="gmail_extra"><br></div><div class="gmail_extra">Tackling the subjective is more difficult. For example, the Subway sandwich shop. If we had hard and fast rules it that every Subway be tagged as amenity=fastfood then we could easily do a quality check. But OSM give people a lot of tagging freedom. </div>


<div class="gmail_extra"><br></div><div class="gmail_extra">One last thing. My sense is that the problem generally isn't the mappers. Yes I screwed up more than my fair share of edits. But most problems are system problems. To fix those we need good data and a willingness to get at the root cause of the problem. </div>


<div class="gmail_extra"><br></div><div class="gmail_extra">Short summary: sample edits, categorize errors, determine root cause, then fix root cause. That process will drastically improve the quality of OSM. Hopefully someone with more recent background in Quality Control can step in here to help me out.<br>


<div><br></div>-- <br><div>Clifford</div><div><br></div><div>OpenStreetMap: Maps with a human touch</div>

</div></div>