[Talk-us] Tidying up TIGER data

Ted Percival ted at midg3t.net
Thu Jun 4 21:59:22 BST 2009


Dave Hansen wrote:
> On Thu, 2009-06-04 at 00:36 -0600, Ted Percival wrote:
>> Its functions are:
>> - Strip "St" suffix from grid-named streets (eg. "South 500 West")
>> - Collapse multiple spaces into a single space (lots of TIGER)
>> - Expand abbreviated directions (eg. "S 500 E" to "South 500 East")
>> - Expand abbreviated suffixes ("Rd" -> "Road", "St" -> "Street", etc)
> 
> This kind of script is useful for small areas that you've looked at
> manually, but please don't apply it too widely.  It does the right
> actions for sanely-named things, but TIGER is full of goofy stuff.
> 
> Consider: "St. Helens St.".  There are also plenty of semi-mistakes or
> weird abbreviations in TIGER that appear to be mistakes.  I wouldn't be
> surprised to see "Saint Street" entered somewhere as
> 
> 	name: "St."
> 	type: "St."
> 
> We don't want to make that "Street Street".  That makes it even
> worse. :)

In the case of street suffix expansion, it only matches words at the end
of the name, so "St Helens St" would become "St Helens Street". That
should avoid false positives in most cases.

While I aimed to make the matching as conservative as possible to avoid
errors, I figure there will be corner cases *somewhere* that it will get
wrong. Hopefully that is offset by being correct in the vast majority of
cases, and changes will be given at least cursory review by an
intelligent being.




More information about the Talk-us mailing list