[Talk-us] Admin boundaries tied to roads

Alan Mintz Alan_Mintz+OSM at Earthlink.Net
Thu Apr 22 03:24:56 BST 2010


At 2010-04-21 17:12, andrzej zaborowski wrote:
>On 22 April 2010 01:18, Apollinaris Schoell <aschoell at gmail.com> wrote:
> > On Wed, Apr 21, 2010 at 3:36 PM, andrzej zaborowski <balrogg at gmail.com>
> > wrote:
> >> Where's damage in that -- is it in that you can now read the name out
> >> without checking the documentation for what that funny string means in
> >> that particular database that is TIGER?

I just had a machine crash as I was trying to find stats, but I'll bet that 
at least 90% of the cases are "St", "Ave"/"Av", and "Blvd"/"Bl", with the 
occasional "Ln" and "Cir"/"Cr" thrown in. When there's a lone N, S, E, or W 
as a prefix to a street name, it's clear to everyone what that means. These 
are the same abbreviations that _everyone_ uses every day - children, 
adults, businesses, governments, etc.

Even when travelling to another country, it takes me very little time to 
understand what common abbreviations are used for in addresses.


> > there is damage by doing it wrong, others have pointed to it already.

And I will do so again. My problem is mostly that this was done without a 
safety net. You clobbered existing data with no easy way to "walk it back". 
The existing name value should have been put in a foo_name tag so we could 
at least see what used to be. I would at least encourage that a bot be run 
to find these edits, find the previous version in history, and do this, if 
we can't soon agree on a better schema to split the name up into components 
at the same time.


> > I am not deep enough into the history of the abbreviations used and who
> > defined them. But I am pretty sure there is a lot of errors.

Errors that I, and a lot of other mappers, painstakingly fixed by hand, 
based on ground surveys and research into public records. In particular, 
I'm worried about the cases where I spelled out North because it was 
actually part of the name, as opposed to a cardinal direction related to 
addresses, which I left alone, hoping to later move the latter directions 
to a addr:direction_prefix tag, while leaving the former along. I can no 
longer distinguish between the two.



>I don't know who defined the ones used in TIGER but this is not the
>only way to abbreviate the names, that is proven by USPS having their
>own list that is not identical.  The most popular words will be the
>same in both lists but some are really cryptic and arbitrary, could as
>well be numeric codes.  Then TIGER also includes Spanish names and the
>list has abbreviations for those too, which rarely anyone in US can
>read, while they can cope with unabbreviated ok.

I don't agree. Much of the US speaks Spanish. Many more possess the 
tremendous brainpower and enoUGH grade-school Spanish required to know that 
Cl. in front of a street name might mean Calle or Cam. might mean Camino, 
or that S means Sur and N means Norte.


> > - in the city I live there is no street sign with street, avenue, 
> boulevard,
> > .... and even more surprising there are no abbreviations either. osm
> > principle is to map what's on the ground. So tiger import is definitely
> > wrong and expanding the names is also wrong. on the other hand postal
> > address usually use it in one or the other form so it's not completely
> > fiction.

Exactly. Many places in Orange County have the bad habit of leaving the 
suffix off the large street signs at intersections, perhaps as a way of 
saving space to reduce sign size and cost. Just because the big sign says 
just Orange doesn't mean that the street's real name is Orange Street, nor 
that it shouldn't be entered into any reasonable database or map that way. 
"map what's on the ground" is the wrong thing to do so often that I don't 
really understand why it was decided upon, nor why people continue hold it 
up on a pedestal, despite continuing problems with it.


>For the record street signs on different ends of the same street often
>use different forms and you'll sometimes find really strange
>conventions, so while I agree mapping what's on the ground is good
>because stuff can be confirmed, in this case it's not a solution.  In
>many places you'll find the names are all caps on the signs but in a
>local newspaper they're capitalized the usual way.

And the signs are sometimes wrong. In the thousands of streets I've 
photographed and mapped, I've corrected hundreds of signage 
errors/inconsistencies, often requiring substantial research into records, 
and resulting in notification of the appropriate authority to fix the 
records and/or signs (for free :( ).


> > - many geocding engines do not find expanded names. even google doesn't in
> > many cases. To me it looks like nearly anyone doesn't use the expanded name
> > at all. So my question is is the expanded name really the correct name?

Exactly! Sounds like it's only useful purpose is text-2-speech. Here's what 
I'd like to see:

name: The pre-balrog name
name_direction_prefix: The 1-2 char cardinal direction before the root
use_name_direction_prefix: {yes|no} Yes indicates that the 
name_direction_prefix should be rendered/spoken
name_root: The actual root of the name
name_direction_suffix: The 1-2 char cardinal direction after the root
name_type: {St|Ave|Blvd|etc.} Common, documented abbreviations allowed
render_name: The name to be rendered on a map (if not name for some reason)
spoken_name: The complete expanded name, ready for text-2-speech

The post-balrog name should go into spoken_name for now. The pre-balrog 
name would be restored to the name tag and also split up into the name_* tags.

In areas where the name_direction_prefix is really an address suffix, and 
does not appear in front of the name in written or spoken addresses or in 
official records, it could then be removed from the name, render_name, and 
spoken_name tags by someone who has verified this, where appropriate.

A bot and/or editor functionality to populate any missing tags, so you're 
not forcing anyone to input in one way or the other. Personally, I would 
mostly populate the component tags with "the truth" and let the 
editors/bots do whatever else.



>I don't know but it seems it's the only unambiguous form.  If you look
>at the names in TIGER, Bing, Google and USPS-abbreviated then they're
>all different and the only common trace is they're somehow derived
>from the full version.

That seems like a rendering issue, something you would always see in trying 
to reconstruct the correct underlying data from a rendering (including OSM 
now, which seems to be re-abbreviating in the renderers, thankfully). That 
doesn't mean we shouldn't try to understand the correct schema for the 
underlying data and model it correctly.


> >> The reason it was done with a script is that doing it manually was
> >> taking a lot of time and mappers were spending that time doing this
> >> instead of going out mapping. Â And it's always been on the wiki about
> >> not using abbreviated names, even when the original import was done,
> >> ignoring this.

So what most newbies, including myself, did, was to follow the style of the 
majority of the data, instead of the often-outdated, incomplete, and 
inaccurate wiki, which is often not even self-consistent.


> > can you provide any stats that mappers spent time on it instead doing
> > anything better?
>
>Only as an inditaction, I spent a while doing that whenever I visited
>a place and at least another two people on IRC asked if there was any
>way to do it automatically, in JOSM or otherwise and we tried to find
>a way to do it in JOSM or with simple regexes on the .osm file but it
>seemed a much better idea to do it consistently for the whole area and
>according to actual documentation that accompanies TIGER.

In the Los Angeles area, I rarely saw expanded names (which is why I 
continue to abbreviate), except for those rare instances where someone drew 
a street from scratch before TIGER (apparently), and not even all of those.


> > Is the wiki any better as a reference than what is in the osm DB? I could
> > change the wiki and then will someone write a bot to reverse it? Is the 
> wiki
> > written with the situation in US in mind?
>
>Well one good rule is if there should be any rules then they should be global.

Why? Road network engineering, while probably striving to implement _some_ 
fundamental best practices, varies widely by location. Data modeling is 
supposed to represent the real world, not an imaginary cookie-cutter place. 
It makes sense to try to find commonalities and model them, but there's a 
line (perhaps fine) between doing this and trying to pound a square peg 
into a round hole.


>You could surely change the wiki but it's a conclusion that a lot of
>people individually seem to come to so I'm sure you wouldn't even need
>a bot before someone would add a phrase to that effect.

I don't know about "a lot". I mostly just hear people regurgitate the 
"don't abbreviate" mantra without justification. Admittedly, maybe it's 
because it's already been hashed out to death and I'm late to the party. 
Regardless, maybe I'm not alone, and it deserves some re-thinking.

Do people that are actually mapping (not bulk-importers) really want to 
type in "North Martin Luther King, Junior Boulevard Southwest" and then 
proofread that to make sure they didn't typo anything? Do people really 
want to follow around some of the more active, but less detail-oriented (or 
perhaps dyslexic) mappers to correct all those additional mistakes?

--
Alan Mintz <Alan_Mintz+OSM at Earthlink.net>





More information about the Talk-us mailing list