[Imports] [Talk-us-newyork] [Imports-us] NYS GIS SAM Address Point Import - Status Update

Skyler Hawthorne osm at dead10ck.com
Sat May 29 03:52:11 UTC 2021


On Fri, May 28, 2021, at 14:00, Jack Arnold via Talk-us-newyork wrote:
> Hello again,
> 
> I looked over Hamilton County around Lake Pleasant. Looking great as
> usual! The data over there seems simple with few apartment compelexes
> or other edge cases.
> 
> My understanding of the conflation is a little bit rusty, but I found
> these nodes that don't seem to be correct in Watkins Glen (Schuyler
> county):
> 
> Original:
> https://www.openstreetmap.org/node/7685157126
> From import:
> https://www.openstreetmap.org/node/8759324317
> From import, but housenumber is different:
> https://www.openstreetmap.org/node/8759324646
> 
> The first two nodes look like they should be conflated, as the data is
> identical. The third one makes sense, the addr:housenumber tag is
> technically different. There are more of these in the same area I
> didn't list.

Yeah, in this case, there was already a duplicate node in the preexisting data (this is what the review tag "found > 1 existing matching address
" means). There were these two before the import:

https://www.openstreetmap.org/node/7685157126
https://www.openstreetmap.org/node/7685157097

When there are multiple matches for the same address, the importer intentionally does not conflate it, because it is ambiguous which one is the "right one," so instead we mark it for review so a human can figure out which one is right and deduplicate.

My guess in this case is that this is a rogue import of the same data (either directly or indirectly), and one of the two is actually supposed to be "311 1/2" (which was imported asĀ https://www.openstreetmap.org/node/8759324646) but either the source data from this import did not fill in the pre-address number field for "addr:housenumber", or this RI-Improve user didn't make use of it, creating what looks like a duplicate.

> Similarly, there are a few buildings in Montour Falls (Schuyler county)
> where the original contributor only put partial data:
> https://www.openstreetmap.org/node/3810205975
> https://www.openstreetmap.org/way/823147605

> 
> More scattered around the block:
> https://www.openstreetmap.org/node/3810205974
> https://www.openstreetmap.org/way/823147893

Yeah, partial data is tricky. In an early iteration of the importer, I actually did fall back to just the house number when it didn't find house number + street, but that led to its own problems where addresses were getting skipped because, sure enough, there were different houses with the same number very close to each other, but they were on different streets. Even worse than skipping it, it could end up filling in the extra missing data on the wrong house because it found 123 B street when it meant to find 123 A street, but the existing element only had the house number. In order to avoid false positives when searching for existing data, we need at least a house number and street to be sure it's the right match.

So I think in these cases, we have a choice between skipping good data and possibly conflating with the wrong house, or "duplicating" it with another node. Personally I'd rather duplicate it so we don't skip good data, or conflate with the wrong building.

> In Penn Yan (Yates county), more I didn't list:
> https://www.openstreetmap.org/node/8766705037
> https://www.openstreetmap.org/way/823227024
> 
> https://www.openstreetmap.org/node/8766705496
> https://www.openstreetmap.org/way/823227003
> 
> https://www.openstreetmap.org/node/8766705350
> https://www.openstreetmap.org/node/8766705421
> https://www.openstreetmap.org/way/823235994

These are actually not conflated on purpose because there are several other address points inside the same building. I did this because the location of the point often matters, especially in larger buildings.

In these cases, I could have chosen to pick the house number that matched and conflate the address point with the building and left the rest of them as points, but the addresses that are on the buildings are not really correct, as it implies that the whole building is that single house number, when it in fact has several. When there is a problem with the existing data, detecting each imaginable problem and deciding what to do would be time consuming, error prone, and not come out great in the effort to benefit ratio; so, generally, I prefer to leave bad data alone and let a human decide what to do with it.

> Also, I found what appears to be a building with many units not
> conflated. Would this be correct behavior if they are seperate physical
> buildings? If there was a large building under all of them, would it
> conflate? Montour Falls (Schuyler county):
> 
> https://www.openstreetmap.org/node/8759325174
> https://www.openstreetmap.org/node/8759325761
> https://www.openstreetmap.org/node/8759324918
> https://www.openstreetmap.org/node/8759324798

This looks correct to me. The one point without a unit looks like the primary address point, and the rest are the locations of the individual units. I chose only to combine units into the primary point when they are all stacked on top of each other, as that doesn't really help anyone. But these unit points show you where each unit is, which is helpful.

As for buildings, the intended behavior is: in the first phase, address points that are stacked on top of each other, and all have the same house number, get combined if the list of units/floors/rooms can fit into a single tag (otherwise they are left as separate stacked nodes). Then it compares with the existing OSM data. If an address point is inside a building, it only gets conflated in the obvious case: it is the only address point inside the building. If there are multiple address points inside the building, then they are left separate, as they often show reasonably exact locations of units or house numbers within the building.

> As a side note, this holds the record for worst (but correct?)
> rendering node I've ever seen:
> https://www.openstreetmap.org/node/8759326333

Haha, yeah, Carto doesn't handle list values well. It's useful to render the unit when it's only a single unit, but that utility breaks down when the value is a list of units.

> I know I raise more issues than solutions, but I hope they are helpful.
> I'd be willing to manually edit some of these if needed. Excellent work
> as always, and I can't wait to see the whole state imported.
> 
> Jack

Thanks so much for reviewing the data, it's super appreciated. Hopefully my answers make sense, and sound like reasonable decisions. I'm open to feedback if you have any.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports/attachments/20210529/3ff281be/attachment-0001.htm>


More information about the Imports mailing list