[Talk-us] name expansion bot (Re: Imports information on the wiki)

Fri Feb 17 20:32:39 GMT 2012

On 02/17/12 22:01, Paul Johnson wrote:
>
> On Fri, Feb 17, 2012 at 12:02 AM, Alan Mintz
> <Alan_Mintz+OSM at earthlink.net <mailto:Alan_Mintz%2BOSM at earthlink.net>>
> wrote:
>
>     At 2012-01-15 05:35, Mike N wrote:
>
>         On 1/15/2012 8:28 AM, Nathan Edgars II wrote:
>
>             Actually the script also expanded the W to West. But my
>             point is that it
>             is a TIGER entry error, and any future script needs to take
>             into account
>             that these exist and people may have already fixed them to
>             the correct
>             names.
>
>
>           Agreed- if we're thinking of a bot that periodically fixes
>         everything, we need a special tag that says
>         "abbreviation_bot=back_off" (but perhaps not so verbose) -
>         something that tells the bot not to touch the name because it is
>         unusual and has been manually checked.
>
>
>     I hope there is no such bot being contemplated again. The last one
>     created lots of issues.
>
>
> Sounds like it would be better to come up with a more comprehensive
> algorithm for the bot, not outright deny the need for it altogether.
>   Granted, it did make minor messes in Oregon (where names with "St."
> meaning "Saint," "Santa/o" or "Sainte" are slightly more common) and
> Oklahoma (where single-letter street names are slightly more common),
> but overall, the automation saved countless hours of manual name
> expansion for the minor cost of having to deal with a very small number
> of largely regionally-isolated edge cases.

i would agree. on my usa visit, i'm doing some minor edits, and they 
have been in florida & chicago mainly. those street names are quite 
painful to deal with.

first, i suspect the damage done would still be way lower than the time 
spent on manual fixing. second, it should be possible to run the 
algorithm and throw out the results for people to review. fix things 
spotted, run it again and again, until no problems are reported, then 
just attack the data.

this is similar to what we did with komzpa for latvia (albeit on a much 
smaller scale, of course). there were a lot of problems like 
capitalisation mistakes, some common spelling mistakes, some common 
mistakes/silly approaches (like name=gas station - but not in english, 
of course :) ). we also had a couple of persons who took josm warning 
about unnamed objects way too seriously, thus we had loads of roads, 
lakes, power substations and other objects with name=. name=a name=l 
name=u aaaand so on.
komzpa prepared first batch run with his automated script, which was 
mostly tested in belarus, as far as i recall, and had some lithuanian 
changs as well. results were disastrous :)
if he had submitted that, i'd probably report him as a vandal ;D
so i took the xml and reviewed every single change in it, reported all 
the problems i found. he applied some fixes and run the script again. i 
reviewed it again and reported (smaller amount) of problems. he spent 
quite some time on those scripts, i spent probably 6-8 hours in total 
for all the review cycles. but the time spent manually to find and fix 
all those problems would be way, way more (or, more likely, most of the 
problems would be never fixed).

so, generate the change xmls, attack them in group, fix mistakes and do 
an automated edit once nobody can find any problems.
-- 
  Rich