[Talk-us] Imports information on the wiki
Martijn van Exel
m at rtijn.org
Tue Jan 3 20:17:02 GMT 2012
On Mon, Jan 2, 2012 at 3:01 PM, Mike N <niceman at att.net> wrote:
> Continued from tagging...
> On 1/2/2012 11:40 AM, Martijn van Exel wrote:
>> What makes you think that the current common opinion is against
>> automatic expansion?
> Mostly because of customary resistance to automatic imports, which is
> rooted in bad imports. Josh has noted in Tagging that it could give a
> false impression of quality to others - I don't agree because I've never
> considered an expanded name a measure of quality or synonymous with the
> removal of the "tiger:reviewed=no" flag.
You're right and I think we need to be pragmatic about this. The TIGER
import was not perfect - the data was not perfect in the first place
(far from it) and things like name expansion should probably have been
done as part of the data preprocessing before the import. But the
situation as it exists now - with half the nation having expanded
names and half the nation with abbreviated ones - is something we need
>> What in particular were the complaints made against the fixbot?
> Some of it was by finding the few obscure cases that don't yield to
> automation: The city of St Louis has several cases of
> "St Louis Street"
> "Saint Louis Street"
> style duplication. The streets are physically not connected and you must
> use the correct naming convention to go to the right place.
> Also - "St Park Rd"; is this "Saint Park Road" or "State Park Road"?
Those are unfortunate false negatives. We can probably not capture
each and every one of those. We need to weigh them against the benefit
of having a consistent naming across the US. I personally tend to
accepting these false negatives. That said, I would want to discuss
this more to get a better understanding of the amount of false
negatives we'd be dealing with.
>> Can/should the fixbot be improved to take them into account?
> I believe we could let it run if everyone agrees on the safe expansions.
> * Rd ->* Road for example.
> And despite it being a bot, the author put in some significant block of
> time to review each upload for correctness and marked many errors that were
> clearly rooted in TIGER data.
It's unfortunate that it was never documented (unless I keep
overlooking the docs). Looking at the script this definitely does not
look like a quick hack.
> There is some minor discrepancy between TIGER abbreviations and common
> street sign / official USPS abbreviations also: Pky -> Pkwy = Parkway
>> especially if a fixbot was
>> only run on half the country, as was the case with the name expansion.
> Ironically, having a split country situation forces data consumers to
> handle both the abbreviated and expanded case (Mainly Nominatum today).
> Even if the entire US data is expanded, that situation will continue to
> exist as new mappers arrive and have never dealt with anything but
> abbreviations on maps, street signs, or addresses. They may even
> re-abbreviate expanded names. Otherwise, we would need bots to run behind
> them and clean up new contributions to make them usable. Or waste other
> mapper's time to do it manually. But manually entered road names cannot be
> automatically expanded, since those will very seldom if ever have
> directional hints like TIGER data has.
Good point. Fixing it now is not a permanent solution. We will have to
keep monitoring. Which brings me to the question of responsibility: to
what extent do we - as the OSM US community - want or need to keep the
data consistent by running monitoring bots? I believe that a certain
degree of inconsistency is to be expected from a crowdsourced dataset
and we should be very careful antagonizing mappers with data
> Nominatum should still continue to handle both cases. Although it has been
> said that map renderering is the place to abbreviate the full name, no map
> renderer currently does this as far as I know. If there is ever a custom US
> style OSM renderer, that would be a logical feature.
martijn van exel
1109 1st ave #2
salt lake city, ut 84103
More information about the Talk-us