As a quick and dirty test I took Florida and Illinois road data from cloudmade.  A simple replace of the top 7 or so suffixes at the end of the name an with a space in front of it resulted in over 700,000 name changes for those 2 states alone, and that did not include all the names with cardinals (prefix and suffix) that need expanding.  It was well over 80% of the names.  Anyone arguing that not scripting these changes should spend a day or two trying to do that by hand and get back to us how they feel afterwards.<br>

<br><br><br><div class="gmail_quote">On Thu, May 10, 2012 at 2:09 PM, stevea <span dir="ltr"><<a href="mailto:steveaOSM@softworkers.com" target="_blank">steveaOSM@softworkers.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I support this methodology in the sense of it being "Vet, then set." (Vet being a verb which my dictionary says means "make a careful and critical examination of something.")<br>

<br>

Sure, saying "reasonably simple grep search and replace" is a bit vague, but I'm not talking about the specifics of this, or any one particular, search, just that doing it to an offline copy and then vetting the results (having our community "discuss, agree, disagree, improve and finalize") sounds like more of the sort of "community consensus workflow steps" that I know are going to produce both harmony and great results.<br>


<br>

THEN upload (set).<br>

<br>

Does this mean I suggest precluding individual edit contributions that have not been more-widely vetted?  Of course not:  we do this all the time.  But as individuals, we just do it on the small scale. It is when we do it on the large scale (as in massive TIGER search and replaces) that I'm saying "Vet, then set" should be done.<br>


<br>

This project, its data, and its interaction amongst us as individual contributors in achieving harmonious consensus can only get better. We do a fair-to-good job now, let's make that "largely a great job" more so in the future.<br>


<br>

SteveA<br>

California<br>

<br>

<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">

The error rate is directly related to how much testing and review is done.  1/1,000 is by no means a set error rate for either manual or bot edits.<br>

<br>

Reasonably simple grep search and replace will correctly expand the example.  The default should and can be to not expand unless it meets specific requirements.<br>

Dr is only expanded to drive if it is at the end of the name, or second to end and followed by cardinal directions (S, E, W, N ect.) but left alone (or set to doctor) if nothing is in front of it.  Let the bot get the easy stuff, and then report on the unknowns for manual edits.<br>


<br>

Run the grep on a copy of the DB, and do reports on the changes. Review just the changed street names before and after for quality control.  Let others review it as well.  Once it is ironed out make the changes in the live DB.  I would guess the error rate after that would be well over 1/1,000,000.<br>


<br>

Either way you can get an idea about the edits without doing anything to the live database.<br></div>

Dale Puch<br>

</blockquote>

<br>

</blockquote></div><br><br clear="all"><br>-- <br>Dale Puch<br>