[Imports] [Imports-us] Draft proposal for import of New York State GIS SAM Address Points

Wed Jan 13 05:53:23 UTC 2021

Sorry, this turned out to be a very long email! I'm glad for the discussion though, and thank you for the feedback!

Jan 12, 2021 17:06:18 Jmapb <jmapb at gmx.com>:

> Re: Tagging Plans
>
> I feel there's little value in importing addr:state=NY with these addresses, since we have well-mapped state borders and the state is also unambiguously encoded in the ZIP code. I wonder if there are good examples of addresses that would truly benefit from this field, possibly on properties that are very near state lines. (We also share a border with Canada of course, so perhaps a similar case could be made for addr:country in the northern reaches of Franklin and Clinton counties.)

I agree with Kevin Kenny on this: postal address delivery boundaries sometimes differ from civil boundaries. I myself live in a house that is just inside a city border, but the postal address is the neighboring city. With that said, I was curious to see if there were any examples, so I went digging and I was able to find only one example where the address is inside the NY civil boundary, but has a postal address with a different state:

2161 North Rd, PA 16928: https://www.openstreetmap.org/search?query=42.0004698601977%2C%20-77.5135375591495

There are a few examples of addresses that are PA, but the address point is actually also inside the PA civil boundary, and there was one MA address as well that was actually inside MA. I'm not sure why they were in the county data, but I guess that's another issue.

There were also a couple that said that had a state code of VT, but they are well within the civil boundary, so I suspect that those are errors.

In any case, considering we're talking about a single address point, I'm no longer sure it matters so much, so I could be convinced to remove it.

> Re: Data Reduction & Conflation
>
> "... checked against an Overpass API for whether the address already exists within a short distance of the point. If any element with the same house number, street, or unit exists, then the address point is skipped." Perhaps I'm misreading this, but it seems to say you will not import an address point if you find another element nearby tagged with the same addr:street? I'd expect to see many instances where a currently-mapped address neighbors an unmapped address on the same street, sometimes quite nearby in dense areas. This sounds like it might exclude a lot of valid addresses.

Sorry, this was poor wording. For each address point without a unit, it will look for anything with the same addr:housenumber and addr:street, and if both match, it skips it. If the address point has a unit, it also checks if the addr:unit matches, and if it does, it skips it. If it doesn't find anything that matches all 3, it looks for something with only the addr:unit, since sometimes people will put the addr:* tags on the building, and put points or ways inside that building with just the unit and nothing else.

I currently don't check if the addr:flats matches because that seems like it could get complicated with checking if a number is within a range, for example. But maybe checking for strict equality to skip might be prudent. If it's exactly equal, I know for sure it can be skipped. And if it's not exactly equal and lies inside an existing building with a different addr:flats, it will get marked for review.

I'll update the wiki and fix the language.

> I like your plan to use the value of nysgissam:review to specify why the review is deemed necessary. I'd love to see a list of the possible values, but I assume that's still evolving.
>
> What's your reasoning behind not conflating address points that lie within a building:part? Not a big issue I presume, just curious.

This is the list so far, but yes, it can certainly evolve:

> "found > 1 existing matching address"

This will happen when the Overpass query finds multiple things with the same addr:housenumber and addr:street. This is likely a mistake in the existing OSM data. (I've seen at least one example of this.)

> "multipolygon"

I tag these for review because conflation can be tricky to automate. It's sometimes obvious when a human is looking at the building, but I imagine there are corner cases where the point is inside a "hole" in the building, or if there are two separate buildings, figuring out if the address belongs to just one of them or both.

> "inside multiple ways"

This gets to your question about building:part: my understanding is this tag's value can be basically any valid value you could use for building=*, which includes things like building:part=roof, which would be a funny thing to have its own address!

It might be possible to enumerate a few values and pick and choose which ones are okay to conflate and which ones aren't, but I think the effort to benefit ratio is beginning to diminish then. So I just decided to let a human decide how to conflate.

> "existing building has different housenumber"
> "existing building has different unit"
> "existing building has different flats"

If the point lies inside a building, but that building has a conflicting address tag. I actually wanted to ask for some advice about this: I've been seeing more than a few examples of this where the existing data conflicts with the state's data. See the following screenshot of data in Watervliet: https://skyler-public.s3.amazonaws.com/images/Screenshot_20210112-235109.jpg

It appears there has already been an import by a user named OceanVortex, and they just cite "Watervliet GIS" as their source. The yellow highlighted nodes are ones that came from the state that are marked because of conflicting data.

At the lower left, you can see the existing data says that building is 610, but the state says it's 612. Who is right? Is it one or the other, or maybe both?

My inclination is to favor existing data, but the only way to be sure who is right is to actually go there in person, and maybe even knock on the door and ask. Is it better to delete these during review, or leave them in case someone local is willing to do some detective work? I'm torn between not wanting to litter the map with a ton of these addresses that need review and could potentially be wrong, and filling in data that could potentially be right.

"repeat address"

The state data itself has the same address more than once. I asked the GIS office about some examples of these repeats, and it seems they have some address points that are differentiated only by a field named LandmarkName that they have only in their internal database. This field has not yet been approved for public consumption because they have not gotten permission from all the counties, so they are scrubbing it from the public data.

> Re: Modifying Existing Elements
>
> Personally I would love to see city and ZIP added to currently-mapped addresses that lack them. Maybe also consider flagging for review when currently-mapped values don't match the SAM values. I'm less keen on adding nysgissam:nysaddresspointid to addresses that didn't originate with this import, since I'd like to be able to tell at a glance if a given address was mapped or imported. Perhaps there could be some variation in the tagging... nysgissam:imported_addressid versus nysgissam:matched_addressid?

I was on the fence about it before, but yeah, I think you guys are right: it would be good to fill this data in as well. I'll put that on my to do list.

Regarding the address point ID tag variations: that's an interesting idea, but when would we say it was "matched" versus "imported"? Is it only "imported" if the existing thing has no addr:* tags at all, and "matched" if the import is just filling in some missing fields, or adding a new node? And do we add the "matched" ID to everything, even when all the tags match (in which case, we'd be basically modifying every single existing element with an address)?

> Re: Addresses of new developments
>
> It makes sense to add addresses to parcels where construction has actually commenced, but IMO adding recordkeeping addresses to completely undeveloped land strains the spirit of OSM a bit. If there's truly no way to distinguish between these two types of development status, I feel that the upside of importing them probably still outweighs the downside.

This is what Craig Fargione had to say about distinguishing them:

>I would echo Frank’s response about the validity of addresses even with missing structures. Developments are often addressed in the early stages when the lot gets subdivided, and then as individual building lots are purchased, the structures are built. In some cases, each lot has an address number on it so I certainly think it would be beneficial to include it in OSM. However, if you determine you only want to include points that are on structures, you could use the PointType field for that. There are 5 different types of points in our file:
>
>1 – Rooftop, 2 – Primary Structure Entrance, 3 – Driveway Entrance, 4 – Parcel Centroid, and 5 – Miscellaneous. Miscellaneous points contain different types of structures or locations, such as utility stations, cell towers, public recreation area, etc. The PlaceType field contains the general category that the Miscellaneous point falls within. If you only wanted structure points, you could use PointType 1, 2, and 3 and get the majority of the points that are directly related to a structure in our file.

I already exclude type 5 as I describe in the wiki. However, just from looking at the data, these nonexistent addresses are mapped as type 4, parcel centroids. However, there are still lots of points that do exist, but are still parcel centroids because the GIS departments themselves make these points from satellite imagery, and they remain parcel centroids until new imagery is available and someone from the state updates them. My own home is an example of this, as it's a building that's only a few years old.

Since there is no way to distinguish "new but actually exists" from "planned but not yet started", I think it's better just to include them.

> Numbered Routes
>
> Browsing through the online map of the SAM address points ( http://www.arcgis.com/home/webmap/viewer.html?url=http%3A%2F%2Fgisservices.its.ny.gov%2Farcgis%2Frest%2Fservices%2FSAM_Address_Points%2FMapServer&source=sd ) I see a lot of variation on how the numbered routes, and the street names of the associated address points, are named. Eg, Route 23, State Route 23, State Highway 23. And of course the current data on OSM is just as messy. I can't help but think this might be a good time to discuss standardizing these statewide.

Yes, I've seen previous discussion of this topic, and how messy the existing data is. I think I'm on the same page as Kevin Kenny: I think until there is standardization, it's best to leave it as it is in the data.

> New York City
>
> There was a full import of building footprints and conflated address data for all 5 NYC boros in 2013, from the city's own open data. Wondering if there's any special consideration given to integrating with these, or is it better to just wall it off and work strictly from the city's data?

I think it might be worth looking at, but yeah, it might well be better just to leave out NYC altogether, depending on whether or how much it differs from the state data. I already mentioned my problems with Watervliet.

> ...These are the questions that occurred so far. They do not constitute an objection to beginning a small test import! Good luck and keep us informed. When this begins to roll out for real, I hope to be able to assist with the import and/or QA labor.
>
> Thanks, Jason
>

Awesome, thank you!

> We also need to flag for review if we find addr:street for which no corresponding `highway=*` exists. I have found cases local to me where there are address points on never-built streets, and some of them collide with buildings that have addresses on adjacent streets. (I actually need to field-survey in there, because there may actually be one or two houses with street addresses on the nonexistent street!)

I actually considered this, because it will help identify cases like new developments that don't yet exist. And it can also be useful to find errors in the existing OSM data; I actually found an example of a road that was misnamed because of this discrepancy: https://www.openstreetmap.org/changeset/94660683

However, I realized that it will also probably add a lot of noise with the number of things flagged for review. The names of state routes are probably going to be the biggest offender here. They are often not named at all, and just have a ref. And trying to special case state routes for this checking seems like a rabbit hole, given the lack of standardization.

So I'm torn.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports/attachments/20210113/efb963b6/attachment-0001.htm>