[Imports-us] Address and Building Data Conflatation
emacsen at gmail.com
Mon Jul 15 15:08:48 UTC 2013
The news out of NYC is that the data should be free for us to use
(with a completely nonrestrictive license). Furthermore, the address
data should be out soon, which should give us exactly what we want-
buildings and addresses for all of NYC (that is all 5 boroughs of New
Once we have that dataset, we'll have a few issues to work through,
and I want to open this discussion for how to approach it:
1. How to conflate the NYC Data for building and address data.
My understanding of the address data is that it should be in the form
of point data, where the building data is polygons.
I know our normal process is to check each building polygon and see if
it contains one or more address points. If it's one address point, the
tags will apply to the building. If it's two points, we'll treat each
point as a entrance on the building and tag them separately.
I know this is a generally slow query to do- and from the building
dataset, there are > 1 million buildings, so this isn't a process I'd
like to do often.
I know others have had to have done this query, eg Ian Dees in
Chicago, and Paul Norman in Vancouver. Anything to share?
2. Conflate with OSM building
There are a number of building traces in NYC already, done by folks
like myself and a few others. If I had to guess, I'd say the number of
traced buildings is probably a few hundred. We will probably need to
look at each building individually and decide which to keep and which
to replace with the city's data.
3. Conflate with OSM address data
There's very little address data in OSM directly, but we have a number
of hand collected POIs, many which have address data.
If we suddenly find ourselves with all the building data and address
data for all of them, it seems like we should avail ourselves of this.
If we are 100% confident in the city data, we may want to adjust the
locations of these POIs to match (if a POI is not on a building with
its corresponding address), or maybe the city data is wrong- or maybe
something else- I'm just tossing out ideas.
I want to stress we're not there yet with NYC data, but I'd like to
have thought this through a bit before the data arrives.
More information about the Imports-us