[Talk-us] name expansion bot (Re: Imports information on the wiki)
Rich
richlv at nakts.net
Fri Feb 17 20:32:39 GMT 2012
On 02/17/12 22:01, Paul Johnson wrote:
>
> On Fri, Feb 17, 2012 at 12:02 AM, Alan Mintz
> <Alan_Mintz+OSM at earthlink.net <mailto:Alan_Mintz%2BOSM at earthlink.net>>
> wrote:
>
> At 2012-01-15 05:35, Mike N wrote:
>
> On 1/15/2012 8:28 AM, Nathan Edgars II wrote:
>
> Actually the script also expanded the W to West. But my
> point is that it
> is a TIGER entry error, and any future script needs to take
> into account
> that these exist and people may have already fixed them to
> the correct
> names.
>
>
> Agreed- if we're thinking of a bot that periodically fixes
> everything, we need a special tag that says
> "abbreviation_bot=back_off" (but perhaps not so verbose) -
> something that tells the bot not to touch the name because it is
> unusual and has been manually checked.
>
>
> I hope there is no such bot being contemplated again. The last one
> created lots of issues.
>
>
> Sounds like it would be better to come up with a more comprehensive
> algorithm for the bot, not outright deny the need for it altogether.
> Granted, it did make minor messes in Oregon (where names with "St."
> meaning "Saint," "Santa/o" or "Sainte" are slightly more common) and
> Oklahoma (where single-letter street names are slightly more common),
> but overall, the automation saved countless hours of manual name
> expansion for the minor cost of having to deal with a very small number
> of largely regionally-isolated edge cases.
i would agree. on my usa visit, i'm doing some minor edits, and they
have been in florida & chicago mainly. those street names are quite
painful to deal with.
first, i suspect the damage done would still be way lower than the time
spent on manual fixing. second, it should be possible to run the
algorithm and throw out the results for people to review. fix things
spotted, run it again and again, until no problems are reported, then
just attack the data.
this is similar to what we did with komzpa for latvia (albeit on a much
smaller scale, of course). there were a lot of problems like
capitalisation mistakes, some common spelling mistakes, some common
mistakes/silly approaches (like name=gas station - but not in english,
of course :) ). we also had a couple of persons who took josm warning
about unnamed objects way too seriously, thus we had loads of roads,
lakes, power substations and other objects with name=. name=a name=l
name=u aaaand so on.
komzpa prepared first batch run with his automated script, which was
mostly tested in belarus, as far as i recall, and had some lithuanian
changs as well. results were disastrous :)
if he had submitted that, i'd probably report him as a vandal ;D
so i took the xml and reviewed every single change in it, reported all
the problems i found. he applied some fixes and run the script again. i
reviewed it again and reported (smaller amount) of problems. he spent
quite some time on those scripts, i spent probably 6-8 hours in total
for all the review cycles. but the time spent manually to find and fix
all those problems would be way, way more (or, more likely, most of the
problems would be never fixed).
so, generate the change xmls, attack them in group, fix mistakes and do
an automated edit once nobody can find any problems.
--
Rich
More information about the Talk-us
mailing list