[Talk-us] Address Node Import for San Francisco

Gregory Arenius gregory at arenius.com
Thu Dec 9 23:00:07 GMT 2010

I've been working on an import of San Francisco address node data.  I have
several thoughts and questions and would appreciate any feedback.

About the data.  Its in a shapefile format containing about 230,000
individual nodes.  The data is really high quality and all of the addresses
I have checked are correct.  It has pretty complete coverage of the entire

First, I've looked at how address nodes have been input manually.  In some
places they are just addr:housenumber and addr:street and nothing else.  In
other places they include the city and the country and sometimes another
administrative level such as state.  Since the last three pieces of
information can be fairly easily derived I was thinking of just doing the
house number and the street.   The dataset is fairly large so I don't want
to include any extra fields if I don't have to.  Is this level of
information sufficient?  Or should I include the city and the state and the
country in each node?

Also, there are a large number of places where there are multiple nodes in
one location if there is more than one address at that location.  One
example would be a house broken into five apartments.  Sometimes they keep
one address and use apartment numbers and sometimes each apartment gets its
own house number.  In the latter cases there will be five nodes with
different addr:housenumber fields but identical addr:street and lat/long
coordinates.  Should I keep the individual nodes or should I combine them?
For instance, I could do one node and have addr:housenumber=5;6;7;8;9 or
have a node for each address.   Combining nodes would cut the number of
nodes imported by about 40% but I fear that it might be harder to work with
manually and also not recognized by routers and other software.

Before importing the data I will run a comparison against existing OSM data
and not upload nodes that match an existing addr:housenumber/addr:street
combination.  There aren't many plain address nodes in the city at the
moment (a couple hundred, tops) but there are a fair number of businesses
that have had address data added to them and I don't want any duplicate
address nodes as a result of this import.

There are only a very few address ways in the SF dataset but they aren't any
where near as accurate as the data I will be importing so I plan on deleting

I haven't yet looked into how I plan to do the actual uploading but I'll
take care to make sure its easily reversible if anything goes wrong and
doesn't hammer any servers.

I've also made a wiki page for the

Feedback welcome here or on the wiki page.

Gregory Arenius
