[Talk-us] Address Node Import for San Francisco

Michal Migurski mike at stamen.com
Sun Dec 12 18:04:47 GMT 2010


On Dec 9, 2010, at 3:00 PM, Gregory Arenius wrote:

> About the data.  Its in a shapefile format containing about 230,000 individual nodes.  The data is really high quality and all of the addresses I have checked are correct.  It has pretty complete coverage of the entire city.

I've worked with this file before. When I matched it to OSM data two years ago, I found that the SF data had numerous errors, so I wrote this mapping script:

	http://mike.teczno.com/img/sf-addresses/mapping.py
	Usage: mapping.py [osm streets csv] [sf streets csv] > [street names csv]

Here are all the street names in the shapefile:
	http://mike.teczno.com/img/sf-addresses/sfaddresses.csv

Here are all the street names in OSM at the time I did the comparison (may have changed since):
	http://mike.teczno.com/img/sf-addresses/osm_streets.csv

And this is the mapping result I got:
	http://mike.teczno.com/img/sf-addresses/street_names.csv

Hopefully this is helpful, as you'll want to import street names that actually match those in OSM's view of San Francisco.

I found some other weird burrs in the data as well, in terms of how it arranges addresses stacked on top of one another in tall buildings. Nothing that can't be dealt with in an import.

I also did a bunch of geometry work to match those address points to nearby street segments in order to break up the street grid into addresses segments, but that code is a bit of a rat's nest. The idea was to build up the little block numbers you see rendered here:
	http://www.flickr.com/photos/mmigurski/5229627985/sizes/l/

Katie's suggestion of breaking the data into smaller chunks is a good one.

-mike.

----------------------------------------------------------------
michal migurski- mike at stamen.com
                 415.558.1610






More information about the Talk-us mailing list