[Imports] NYC building + address import - to merge or not to merge?

Serge Wroclawski emacsen at gmail.com
Mon Oct 14 17:01:21 UTC 2013

There's a bit of context missing from this conversation, so I want to
fill people in on what happened, and then we can discuss the technical

Alex made his proposal (with my endorsement) before the import
happened. Alex suggestebut then modified the data to this alternate
way before the import event. So now we have both kinds of addresses in
NYC. You can see some of the pre-import discussion at:

We also have an import event to show what happens with users and the data.

On Mon, Oct 14, 2013 at 12:35 PM, Alex Barth <alex at mapbox.com> wrote:

> Now there's reason to revisit this decision: the data steward (Colin Reilly
> from NYC GIS) told me that NYC GIS took great care to place addresses at
> about where the entrance of the building sits.

We have a tag "entrance", but when I suggested tagging the entraces as
entrances,  Alex suggested that many of the points were not entrances,
but centroids.

If they're addresses, I say tag them as addresses. Then we can
encourage mappers on the ground to refine the entrances to match
ground truth, and also to add service entrances, etc.

> This makes me think that there's value in not tossing the address location
> information but keep it in all cases, even if there is only one address per
> building.
> Here is a comparison of the two options. I'd like to discuss and decide at
> tonight's imports hangout.
> ## Option 1: Merge addresses into buildings where possible
> In cases where there is one address point within a building polygon, we take
> address attributes, assign it to the building polygon and toss the address
> point.
> Pros:
> a) This is how a lot of buildings are done in OSM
> b) Not regarding standing practice, merging addresses into buildings is an
> exception from the generally applicable method of doing separate address
> points.
> Cons:
> a) we lose data

If we tag the entrances as entrances, as suggested on the issue 15, we
lose no data.

> b) makes it harder for NYC GIS to leverage OSM

> ## Option 2: Always keep address points separate
> In this case we never merge addresses to building polygons, instead always
> keep them as separate entities.
> Pros:
> a) this is the NYC GIS way, making it nicer for GIS folks to use OSM
> b) this is the generally applicable method. No matter whether we have one or
> multiple addresses you can expect to find a separate node carrying address
> information.
> c) retains useful information
> Cons:
> a) Diverges (but does not violate [1]) common OSM practice

b) We see users mistagging addresses

In NYC, at the import, I've seen users tagging the address points with
building information, such as the type of building it is

This confusion is probably going to continue, leading to more problems
in OSM where the attributes of the building are placed off the

c) We see multiple addresses

In the NYC import, I've found multiple addresses in/near a building.
This is from previous data, and needs cleanup. Multiple address points
aren't wrong when there's a POI and a building, but without a
building, it's confusing

d) It adds an extra step to data consumers

If you tag the building with an address, you can get the address of it
by looking at it. And if you have a POI within the geometry of the
building, you can get it by looking at the building container, if the
node doesn't have it.

With nodes as points, you have to look at the poi node, then look at
the building, see that the building has no address, then look for a

If you tag the node as is done here, you have to look at the building,
see it has no address data, then look for nodes within it, and see if
they do.

e) It adds no explicit value without entrance tag

Tagging entrances as values with an address is useful. Not only does
it have value on its own, but you can even add a tag to indicate that
the entrance is off a certain street (we don't have a tag for this
now, but it wouldn't be hard to add).

But right now, we don't have the entrance tag, so we lose the benefits of both.

I think both ways are suboptimal, but they both bear consideration.

- Serge

More information about the Imports mailing list