[Imports-us] Address and Building Data Conflatation

Serge Wroclawski emacsen at gmail.com
Mon Jul 15 19:39:40 UTC 2013


On Mon, Jul 15, 2013 at 3:14 PM, Brian H Wilson <brian at wildsong.biz> wrote:
> Pulling this off the NYC topic and onto my rural project -- ignore if you
> are only interested in NYC!
>
>
> On 07/15/2013 08:08 AM, Serge Wroclawski wrote:
>>
>> 1. How to conflate the NYC Data for building and address data.
>>
>> My understanding of the address data is that it should be in the form
>> of point data, where the building data is polygons.
>
>
> When I was working on this a few weeks ago the message I got was that
> addresses should be always be attached to buildings as tags and that there
> should not be separate address points. I am now confused. I spent a lot of
> time trying to come up with a good way to attach addresses to individual
> buildings. Maybe that was a waste of time?

You're missing the context of that quote. The "should be in the form
of point data" is in what form NYC would release their data (should
that actually happen), not a process for OSM.

That's why I then discussed how to process this into OSM consumable data.

> Rural areas frequently have many buildings per tax lot. They are all at the
> same address yet having each barn and shed tagged with street number still
> seems wrong to me. So does trying to find an algorithm to pick out which
> building polygon is the house (123) and which is the granny unit out back
> (123 1/2) and which is the barn.

We discussed this on the hangout several weeks ago, and barns and
other secondary buildings probably shouldn't have addresses on them at
all.

> I am beginning to think that trying to pre-process ALL the data into perfect
> shape before an import is a lofty goal but perhaps means no import will ever
> happen because there not many people here and vanishingly few capable
> volunteers. They are not up to being trained to use JOSM, their eyes glaze
> over in 30 seconds.

I am not sure what you are saying, but if you are saying that import
into OSM in a sustainable way- you're right, but what's far worse is
bad import that has to be cleaned up later.

> In the past (not OSM) I have dealt with addresses by using the tax lot
> centroid to create a point layer and then (as time permits) adjusting the
> point to the appropriate location (sometimes it's center of the primary
> building, sometimes it's the driveway entrance to the property, depending on
> the local fire department since that's who I am working for.)

If you have a small number of objects, this may be doable.

The solution that we thought made more sense was for residences, to
use the larger structure as the object with the address. We'll assume
that most people choose to live in the larger structure, and those
that don't (for example, that have a greenhouse that's larger than
their home), we can fix later.

Obviously for commercial property, this is not going to work as well.

> It still seems better to me to keep the address numbers separated from the
> buildings,  as per your example when you want to separate two entrances to
> the same building or in my case when you want to have a large building with
> 10 addresses. If you put them in as points, and they are not perfect on this
> first pass, then searching on address will still get you close enough to
> find the front door and later on anyone can edit them to push them into a
> better position.

Naked address nodes have their own problems, and those problems are
worse than the conflation issues, IMHO.
>
>> I know our normal process is to check each building polygon and see if
>> it contains one or more address points. If it's one address point, the
>> tags will apply to the building. If it's two points, we'll treat each
>> point as a entrance on the building and tag them separately.
>
> What if there are 10 or more addresses for one building? This often happens
> in my area when there are townhouses or apartments and each one has a
> separate address. (Not just a unit number)

We can find that out and handle it:
http://wiki.openstreetmap.org/wiki/Addresses#Buildings_with_multiple_house_numbers

In fact the examples given cover the exact situation you're envisioning.

> Doesn't having a mix of points and polygons tagged in the same area make
> things confusing? Maybe this is just because I am new at OSM, so I am not
> used to having a single heterogeneous data set.

I think this is a three part question:

1. Is it confusing to me
2. Is it confusing to our tools?
3. Is it confusing to other mappers?

It's certainly not confusing to me. I'd rather all the buildings had
addr:street and addr:housenumber on them, and then we'd be all set. In
my mind, an address is an attribute of a building.

It's not confusing to our tools. Potlatch and iD have preset fields
for addresses.

And for other mappers- I think naked nodes are confusing because they
don't correspond to a physical, observable object. Maybe that's a
difference between the way a GIS person thinks, and an OSM person.

> In my region I have never seen data with buildings that already have
> addresses attached.

Address data is scarce for the whole world in OSM, and that's why
we're so focused on it.

> Where I work (OR|CA|WA), buildings are on tax lots and
> tax lots have addresses.

We expect anyone, with no external information, to be able to survey
an area. Tax lots aren't really surveyable, which is why we're
generally not in favor of including them on OSM.

> Are there are 5 houses or 50
> apartments there? It is often ambiguous. It's good enough to get a firetruck
> to the front door of 1060 but would probably take someone who lives in the
> building(s) to accurately place individual points.
>
> Another good one that crops up is mobile home parks and vacation parks,
> where there can be 50 slots in one tax lot, each with a separate phony
> address. The tax assessor's official address is "123 Main St" but the PO
> still delivers mail to "5 Sunset Village Homes". This can't be dealt with
> algorithmically because it requires local knowledge.

I agree that local knowledge and manual survey are the best form of
data and what we should ideally be collecting.

But then I think "A million buildings sure is a lot of buildings" and
would prefer to start with *something*

> I probably won't attend the hangout this afternoon because it seemed like I
> bumped someone else out last time who actually needed to be there, and I
> have not actually had any time to work on the Benton county import lately.

I don't think this week will be as busy. If we continually bump up
against the limit, I will look into alternatives.

- Serge



More information about the Imports-us mailing list