[Talk-us] Address Node Import for San Francisco

Fri Dec 10 00:29:04 GMT 2010

On Thu, Dec 9, 2010 at 6:20 PM, Serge Wroclawski <emacsen at gmail.com> wrote:

> On Thu, Dec 9, 2010 at 6:00 PM, Gregory Arenius <gregory at arenius.com>
> wrote:
> > I've been working on an import of San Francisco address node data.  I
> have
> > several thoughts and questions and would appreciate any feedback.
>
> The Wiki page doesn't mention the original dataset url. I have a few
> concerns:
>
> 1) Without seeing the dataset url, it's hard to know anything about
> the dataset (its age, accuracy, etc.)
>

> This is a real problem with imports- knowing the original quality of
> the dataset before it's imported.
>
> The project has had to remove or correct so many bad datasets, it's
> incredibly annoying.
>
> > About the data.  Its in a shapefile format containing about 230,000
> > individual nodes.  The data is really high quality and all of the
> addresses
> > I have checked are correct.  It has pretty complete coverage of the
> entire
> > city.
>
> MHO is that individual node addresses are pretty awful. If you can
> import the building outlines, and then attach the addresses to them,
> great (and you'll need to consider what's to be done with any existing
> data), but otherwise, IMHO, this dataset just appears as noise.
>

> Also, there are a large number of places where there are multiple nodes in
> one location if there is more than one address at that location.  One
> example would be a house broken into five apartments.  Sometimes they keep
> one address and use apartment numbers and sometimes each apartment gets
its
> own house number.  In the latter cases there will be five nodes with
> different addr:housenumber fields but identical addr:street and lat/long
> coordinates.

> Should I keep the individual nodes or should I combine them?

 Honestly, I think this is a very cart-before-horse. Please consider
> making a test of your dataset somewhere people can check out, and then
> solicit feedback on the process.
>
>
> > I haven't yet looked into how I plan to do the actual uploading but I'll
> > take care to make sure its easily reversible if anything goes wrong and
> > doesn't hammer any servers.
>
> There are people who've spent years with the project and not gotten
> imports right, I think this is a less trivial problem than you might
> expect.
>
>
> > I've also made a wiki page for the import.
> >
> > Feedback welcome here or on the wiki page.
>
> This really belongs on the imports list as well, but my feedback would be:
>
> 1) Where's the shapefile? (if for nothing else, than the licnese, but
> also for feedback)
> 2) Can you attach the addresses to real objects (rather than standalone
> nodes)?
> 3) What metadata will you keep from the other dataset?
> 4) How will you handle internally conflicting data?
> 5) How will you handle conflicts with existing OSM data?
>
> - Serge
>
>
A few comments...

1) San Francisco explicitly says they do not have building outline data. :(
So, I suppose we get to add buildings ourselves.  I do see that SF does have
parcels.

For DC, we are attaching addresses to buildings when there is a one-to-one
relation between them.  When there are multiple address nodes for a single
building, then we keep them as nodes. In vast majority of cases, we do not
have apartment numbers but in some cases we have things like 1120a, 1120b,
1120c that can be imported.  Obviously, without a buildings dataset, our
approach won't quite apply for SF.

2) I don't consider the addresses as noise.  The data is very helpful for
geocoding.  If the renderer does a sloppy job making noise out of addresses,
the renderings should be improved.

3) Having looked at the data catalogue page, I do have concerns about the
terms of use and think it's best to get SF to explicitly agree to allow OSM
to use the data.

http://gispub02.sfgov.org/website/sfshare/index2.asp

4) If you can get explicit permission, then I suggest breaking up the
address nodes into smaller chunks (e.g. by census block group), convert them
to osm format with Ian's shp-to-osm tool, and check them for quality and
against existing OSM data (e.g. existing pois w/ addresses) in JOSM before
importing.  QGIS and/or PostGIS can be useful for chopping up the data into
geographic chunks.  This approach gives opportunity to apply due diligence,
to check things, and keep chunks small enough that it's reasonably possible
to deal with any mistakes or glitches.

-Katie

> _______________________________________________
> Talk-us mailing list
> Talk-us at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/talk-us
>

-- 
Katie Filbert
filbertk at gmail.com
@filbertkm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-us/attachments/20101209/a330a05c/attachment.html>