[Talk-us] Address Node Import for San Francisco

Gregory Arenius gregory at arenius.com
Fri Dec 10 00:38:15 GMT 2010

On Thu, Dec 9, 2010 at 3:20 PM, Serge Wroclawski <emacsen at gmail.com> wrote:

> On Thu, Dec 9, 2010 at 6:00 PM, Gregory Arenius <gregory at arenius.com>
> wrote:
> > I've been working on an import of San Francisco address node data.  I
> have
> > several thoughts and questions and would appreciate any feedback.
> The Wiki page doesn't mention the original dataset url. I have a few
> concerns:<http://gispub02.sfgov.org/website/sfshare/catalog/sfaddresses.zip>

The shapefile is

I added it to the wiki.  I'm sorry, it should have been there to start with.

> 1) Without seeing the dataset url, it's hard to know anything about
> the dataset (its age, accuracy, etc.)
> This is a real problem with imports- knowing the original quality of
> the dataset before it's imported.
> The project has had to remove or correct so many bad datasets, it's
> incredibly annoying.

I've spot checked a number of blocks by going out and comparing the data and
been impressed with its accuracy.  The data is sourced from the Department
of Building Inspection's Address Verification System, the Assessor-Recorder
Office's Parcel database and the Department of Elections (Voter Registration
Project).  I believe it to be high quality and have been told by another
that has used it that the dataset is "legit."

> > About the data.  Its in a shapefile format containing about 230,000
> > individual nodes.  The data is really high quality and all of the
> addresses
> > I have checked are correct.  It has pretty complete coverage of the
> entire
> > city.
> MHO is that individual node addresses are pretty awful. If you can
> import the building outlines, and then attach the addresses to them,
> great (and you'll need to consider what's to be done with any existing
> data), but otherwise, IMHO, this dataset just appears as noise.

 The wiki states that this is how address nodes are done.  They can be
attached to other objects of course but they can also be independent.  Like
I stated earlier I did check how they are actually being done elsewhere and
the ones I've seen entered are done in this manner.

Also, why do you think of them as noise?  They're useful for geocoding and
door to door routing.  The routing in particular is something people clamor
for when its lacking.

As for attaching them to buildings that doesn't particularly work well in
many cases especially in San Francisco.  For instance a building might have
a number of addresses in it.  A large building taking up a whole block could
have addresses on multiple streets.  Also, we don't have building outlines
for most of SF and that shouldn't stop us from having useful routing.

> > Also, there are a large number of places where there are multiple nodes
> in
> > one location if there is more than one address at that location.  One
> > example would be a house broken into five apartments.  Sometimes they
> keep
> > one address and use apartment numbers and sometimes each apartment gets
> its
> > own house number.  In the latter cases there will be five nodes with
> > different addr:housenumber fields but identical addr:street and lat/long
> > coordinates.
> > Should I keep the individual nodes or should I combine them?
> Honestly, I think this is a very cart-before-horse. Please consider
> making a test of your dataset somewhere people can check out, and then
> solicit feedback on the process.

As I'm still planning things out I think its a good time to discuss this
type of issue.  As to a test, what do you recommend?  Tossing the OSM file
up somewhere for people to see or did you mean more testing the upload
process on a dev server type of thing.  I'm planning on doing both but if
you have other ideas that might help I'm listening.

> > I haven't yet looked into how I plan to do the actual uploading but I'll
> > take care to make sure its easily reversible if anything goes wrong and
> > doesn't hammer any servers.
> There are people who've spent years with the project and not gotten
> imports right, I think this is a less trivial problem than you might
> expect.
I hear this every time imports come up.  I got it.  Its hard.  Thats why I'm
soliciting feedback and willing to take my time and am really trying to do
it correctly.  I'm not willing to just give up because there have been
problems with imports in the past.

> > I've also made a wiki page for the import.
> >
> > Feedback welcome here or on the wiki page.
> This really belongs on the imports list as well, but my feedback would be:
> 1) Where's the shapefile? (if for nothing else, than the licnese, but
> also for feedback)
 I added it to the wiki page.  Again I'm sorry it wasn't there to begin
with.  The shapefile is
for the license I believe its okay but I posted that bit to talk legal
because I thought it belonged there.

> 2) Can you attach the addresses to real objects (rather than standalone
> nodes)?

Generally speaking, no.  We don't have shapes for most of the buildings in
the city and even if we did many of them are large building and have
multiple addresses within them.  I do have access to a parcel map that could
be imported that has address ranges for each parcel but it doesn't deal well
with large plots that have addresses on multiple streets, is larger than we
need for this purpose and I just didn't think it would be an appropriate

3) What metadata will you keep from the other dataset?

I wasn't planning on keeping any.  Just lat/long addr:housenumber and
addr:street.  The source will be dealt with by using an account just for
this import.  There is metadata that I'm dropping such as what the original
source was (AVS or VRJ or Parcel data).  I'm also dropping a lot of meta
data about the street such as the breakdown of street name, street type,
street cardinal directions, that sort of thing.  Also being dropped is the
unique object ID for each node.  I think that the housenumber/street combo
is a unique enough ID.

4) How will you handle internally conflicting data?
I've only found one dupe in the data and I've looked at it for a long time.
The number is small enough to examine each case on its own.

> 5) How will you handle conflicts with existing OSM data?
Data in OSM is assumed to be better.   If there is a matching
housenumber/street combo already in OSM then a second won't be uploaded.

- Serge
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-us/attachments/20101209/7b378220/attachment-0001.html>

More information about the Talk-us mailing list