[OSM-dev] extracting house-number from string

David Lynch djlynch at gmail.com
Sat Mar 14 20:39:09 GMT 2009


On Sat, Mar 14, 2009 at 09:47, Frederik Ramm <frederik at remote.org> wrote:
> 5325 E. Pacific Coast Hwy
> Long Beach, CA 90804

Short parsing of what it is:
Building number: 5325
Street name: E Pacific Coast Hwy
City: Long Beach
State: CA
Postcode: 90804

You are correct that the "E" references east, but it's not the east
side of the street. It's part street name (and will be found there in
the OSM data), part address. Essentially, US cities are numbered on a
Cartesian grid, with its origin in the center of the city. So "5325
East Pacific Coast Highway" indicates that this address is to the east
of where  Pacific Coast Highway crosses the dividing line. (That the
address iseast of the dividing line also implies that Pacific Coast
Highway is an east-west road.) Side of the street is indicated by
whether a number is even or odd. So, 5325 East is nearby to 5300 but
on the opposite side of the street. Both are quite a long way
(probably several miles) from 5300 West.

>From a programming standpoint, I'll give you some tips on US addresses:
- The United States Postal Service is generally accepted as the
arbiter of address formatting in the USA, and they mandate that pretty
much everything that can be abbreviated is abbreviated, so the TIGER
data (and most OSM mappers I'm aware of) do the same. The list of
abbreviations is here:
http://www.usps.com/ncsc/lookups/usps_abbreviations.html#suffix
- Directions can come before the "main" name (S 1st Street), after
(Pennsylvania Avenue NW), or on both sides (W Loop 410 N). The TIGER
data in the US is not very good at getting them in the right position.
- "Suffixes" (Street, Lane, Way, Boulevard, etc.) are frequently
dropped when giving addresses informally.
- Building number always comes before street name, followed by city,
state, and ZIP code (postcode,) in that order.

-- 
David J. Lynch
djlynch at gmail.com




More information about the dev mailing list