[Talk-us] Santa Clara County (California) address import?

Tod Fitch tod at fitchdesign.com
Fri Jan 30 05:38:14 UTC 2015

I've walked all the streets within a couple of miles of my home to collect addresses for OpenStreetMap. In the area that I've tried to cover pretty well there are now nearly 10,000 items tagged with addr:housenumber, nearly all of them surveyed and entered by me. I'd like to finish off my city and maybe the county but don't have the time to walk every street and there are some that are labeled as private which I typically avoid.

Now I've found that there is an address data set for the entire county I live in on the Open Addresses web site [1]. It appears that the data originally came from the Santa Clara County government.

They have both the original data and processed data [2]. It looks like the processed CSV drops the prefix ("North", "South", etc.), the city and the ZIP code.

I've tossed together a quick script to convert the CSV file into an OSM XML file and overlaid it with areas that I have manually collected data and it looks reasonably good. Not perfect, but it seems to be a good enough source to consider importing. There will need to be close examination of each address to see if it is plausible, etc.

First question, is their Creative Commons license [3] compatible with OSM? From previous list traffic from SteveA regarding California government data I believe that even if the CC license is not compatible with OSM the county data will be licensed appropriately and I could get the data directly from the county rather than through Open Addresses.

Second question, what is the typical work flow for using shp files with JOSM? (I suspect either a plug-in is needed or some other tools to munge the data into something JOSM can use.)

Third (extended) question, I've never done a data import before. Am I correct in thinking the general process should be:
1. Setup a separate account to use for importing data.
2. Break the data into managable chunks (one to a few square blocks).
3. Do one chunk at at time. "Doing" being defined as:
4a. Pull chunk into JOSM on one layer.
4b. Check for any existing addresses. Resolve discrepancies giving existing OSM data preference.
4c. For new addresses (not yet in OSM), verify the street names on the addresses are correct.
4d. For new addresses (not yet in OSM), verify the house numbers are plausible with the known numbering system in the area.
4e. Have a tag on each added or changed object indicating the source of the address data.

Am I going off into left field here are is this a reasonable thing to do?

Tod Fitch

[1] http://openaddresses.io
[2] http://data.openaddresses.io/runs/1422342479.599/index.html
[3] http://creativecommons.org/about/cc0

More information about the Talk-us mailing list