[Talk-us] Fixing TIGER street name abbreviations
Dale Puch
dale.puch at gmail.com
Thu May 10 20:28:00 BST 2012
As a quick and dirty test I took Florida and Illinois road data from
cloudmade. A simple replace of the top 7 or so suffixes at the end of the
name an with a space in front of it resulted in over 700,000 name changes
for those 2 states alone, and that did not include all the names with
cardinals (prefix and suffix) that need expanding. It was well over 80% of
the names. Anyone arguing that not scripting these changes should spend a
day or two trying to do that by hand and get back to us how they feel
afterwards.
On Thu, May 10, 2012 at 2:09 PM, stevea <steveaOSM at softworkers.com> wrote:
> I support this methodology in the sense of it being "Vet, then set." (Vet
> being a verb which my dictionary says means "make a careful and critical
> examination of something.")
>
> Sure, saying "reasonably simple grep search and replace" is a bit vague,
> but I'm not talking about the specifics of this, or any one particular,
> search, just that doing it to an offline copy and then vetting the results
> (having our community "discuss, agree, disagree, improve and finalize")
> sounds like more of the sort of "community consensus workflow steps" that I
> know are going to produce both harmony and great results.
>
> THEN upload (set).
>
> Does this mean I suggest precluding individual edit contributions that
> have not been more-widely vetted? Of course not: we do this all the time.
> But as individuals, we just do it on the small scale. It is when we do it
> on the large scale (as in massive TIGER search and replaces) that I'm
> saying "Vet, then set" should be done.
>
> This project, its data, and its interaction amongst us as individual
> contributors in achieving harmonious consensus can only get better. We do a
> fair-to-good job now, let's make that "largely a great job" more so in the
> future.
>
> SteveA
> California
>
>
>
> The error rate is directly related to how much testing and review is
>> done. 1/1,000 is by no means a set error rate for either manual or bot
>> edits.
>>
>> Reasonably simple grep search and replace will correctly expand the
>> example. The default should and can be to not expand unless it meets
>> specific requirements.
>> Dr is only expanded to drive if it is at the end of the name, or second
>> to end and followed by cardinal directions (S, E, W, N ect.) but left alone
>> (or set to doctor) if nothing is in front of it. Let the bot get the easy
>> stuff, and then report on the unknowns for manual edits.
>>
>> Run the grep on a copy of the DB, and do reports on the changes. Review
>> just the changed street names before and after for quality control. Let
>> others review it as well. Once it is ironed out make the changes in the
>> live DB. I would guess the error rate after that would be well over
>> 1/1,000,000.
>>
>> Either way you can get an idea about the edits without doing anything to
>> the live database.
>> Dale Puch
>>
>
>
--
Dale Puch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-us/attachments/20120510/5a9160fd/attachment.html>
More information about the Talk-us
mailing list