[Talk-us] Address Node Import for San Francisco

Serge Wroclawski emacsen at gmail.com
Thu Dec 9 23:20:31 GMT 2010

On Thu, Dec 9, 2010 at 6:00 PM, Gregory Arenius <gregory at arenius.com> wrote:
> I've been working on an import of San Francisco address node data.  I have
> several thoughts and questions and would appreciate any feedback.

The Wiki page doesn't mention the original dataset url. I have a few concerns:

1) Without seeing the dataset url, it's hard to know anything about
the dataset (its age, accuracy, etc.)

This is a real problem with imports- knowing the original quality of
the dataset before it's imported.

The project has had to remove or correct so many bad datasets, it's
incredibly annoying.

> About the data.  Its in a shapefile format containing about 230,000
> individual nodes.  The data is really high quality and all of the addresses
> I have checked are correct.  It has pretty complete coverage of the entire
> city.

MHO is that individual node addresses are pretty awful. If you can
import the building outlines, and then attach the addresses to them,
great (and you'll need to consider what's to be done with any existing
data), but otherwise, IMHO, this dataset just appears as noise.

> Also, there are a large number of places where there are multiple nodes in
> one location if there is more than one address at that location.  One
> example would be a house broken into five apartments.  Sometimes they keep
> one address and use apartment numbers and sometimes each apartment gets its
> own house number.  In the latter cases there will be five nodes with
> different addr:housenumber fields but identical addr:street and lat/long
> coordinates.

> Should I keep the individual nodes or should I combine them?

Honestly, I think this is a very cart-before-horse. Please consider
making a test of your dataset somewhere people can check out, and then
solicit feedback on the process.

> I haven't yet looked into how I plan to do the actual uploading but I'll
> take care to make sure its easily reversible if anything goes wrong and
> doesn't hammer any servers.

There are people who've spent years with the project and not gotten
imports right, I think this is a less trivial problem than you might

> I've also made a wiki page for the import.
> Feedback welcome here or on the wiki page.

This really belongs on the imports list as well, but my feedback would be:

1) Where's the shapefile? (if for nothing else, than the licnese, but
also for feedback)
2) Can you attach the addresses to real objects (rather than standalone nodes)?
3) What metadata will you keep from the other dataset?
4) How will you handle internally conflicting data?
5) How will you handle conflicts with existing OSM data?

- Serge

More information about the Talk-us mailing list