[Imports] [Imports-us] [Talk-us-newyork] Update: New York State GIS SAM Address Points Import

Skyler Hawthorne osm at dead10ck.com
Sun Feb 28 04:50:15 UTC 2021


Hi everyone. I have good news and bad news. The good news is I figured out why there seemed to be a few address points that did not conflate with buildings they were inside, and I've put a safeguard in place to protect against it from happening again.

Here's the bad news. I ran the import again over a few of the affected areas, and strangely, it was conflating correctly. So why did it happen to begin with? After several hours looking closely at my own code trying to find a bug, I came to the conclusion that the only possible way this could be happening is if the server was returning incorrect results and saying that there was no building here. Sure enough, that is what's happening. We can see in a few debug lines here:

2021-02-27T21:09:49.075154944-05:00 WARN nys_gis_sam_import_rs::filter - received old response: OSMDocument { osm3s: Osm3s { timestamp_osm_base: 2021-02-22T02:20:39Z }, elements: [] }               2021-02-27T21:09:49.198089006-05:00 WARN nys_gis_sam_import_rs::filter - received old response: OSMDocument { osm3s: Osm3s { timestamp_osm_base: 2021-02-22T02:20:39Z }, elements: [] }               2021-02-27T21:09:49.270413485-05:00 WARN nys_gis_sam_import_rs::filter - received old response: OSMDocument { osm3s: Osm3s { timestamp_osm_base: 2021-02-22T02:20:39Z }, elements: [] }

We can see the server is returning responses that say there are no elements that match the query. The timestamps are 5 days old, and they are all the same.

This problem has happened before in an early stage of the development of my program. I filed a support ticket and Kumi informed me that they had a couple of servers that they had recently provisioned that did not bootstrap themselves correctly, and were not updating the OSM data. Unfortunately, it appears they have not fixed this problem. At the time, I attempted to implement a safeguard by checking the age of the response, but I was not able to verify that it successfully caught old responses because they had fixed it on their end by the time I got to try it out. And unfortunately, I flipped the arithmetic and the check did not work properly because of it.

It pains me to say that this is likely to have affected all the imports I've done this week. If the problem began on 22 Feb, then the affected data includes Erie, Warren, Dutchess, and Ulster counties, which combined total about 600,000 addresses.

Fortunately, a good proportion of the requests appear to be recent data, and even among the outdated responses, not all of them returned zero elements, which means not every response from the bad server(s) would lead to an incorrect result. So the proportion of the affected addresses that were not imported correctly is impossible to know accurately. If the Boston area that Jack pointed out is any general reflection, then it doesn't look like a huge proportion.

As for what exactly to expect to see for the affected addresses, if we assume that any of the queries the importer performs could be incorrectly returning zero elements, then we could see:

* Duplicate addresses
* Address points that should have been conflated with a building, but were left as nodes
* Elements that are falsely tagged for review because no street by the name in the address was found
* Any combination of the above

All in all, this is super unfortunate, but I would venture to say it's not a catastrophe. It is extra frustrating since I had already reported this problem to Kumi a few months ago and they still have not fixed it.

I think at some point, I will add a section to the wiki page that details some of what happened, and describe what to expect, should people come across these issues in the data.

Feb 26, 2021 00:46:05 Skyler Hawthorne <osm at dead10ck.com>:

> So it turns out the issue for the missing addresses was simple/dumb: one of my osc part files failed on upload because of a conflict, and I missed it 😟 but fortunately, I've been archiving the osc and diff.xml files for just this purpose, which is how I noticed one of the parts was missing a diff.xml. This area now has all the addresses filled in.
> 
> I'm still not sure why there are those points that did not get conflated with the buildings when there is only one. I will continue to dig and see if I can figure it out.
> 
> Feb 25, 2021 16:20:17 Jack Arnold via Imports-us <imports-us at openstreetmap.org>:
> 
>> I pulled "Elmwood Village Area" off one of the building tags. Its not
>> only in that area. Here are coordinates to areas I found where
>> buildings are well mapped but little or no addresses were imported:
>> 
>> 42.9479,-78.8571
>> 42.9251,-78.8859
>> 42.9488,-78.8242
>> 
>> On Thu, 2021-02-25 at 14:34 -0500, Skyler Hawthorne wrote:
>>> On Thu, Feb 25, 2021, at 13:21, Jack Arnold via Imports wrote:
>>>> Hi Skyler,
>>>> 
>>>> Goodwork as usual! I might have found some more weird behavior, but
>>>> overall a smooth import.
>>>> 
>>>> I'm looking at the "Elmwood Village Area", and there are few (~3
>>>> per
>>>> block) address points imported in comparison to many buildings.
>>>> This
>>>> appears to be the case for most of area north of downtown Buffalo.
>>>> In
>>>> city blocks where there aren't any buildings, the address points
>>>> are
>>>> populated. Not sure if this data is just missing from the source
>>>> dataset.
>>> 
>>> Hi Jack, thanks so much for looking over the data! A search for
>>> "Elmwood Village" didn't turn anything up in Nominatim; could you
>>> give a coordinate with the approximate location so I can take a look,
>>> just to confirm whether or not this data is simply missing from the
>>> source?
>>> 
>>>> 
>>>> There are also some address points imported that didn't get merged
>>>> with
>>>> building footprints:
>>>> 
>>>> https://www.openstreetmap.org/node/8454667934
>>>> https://www.openstreetmap.org/node/8454665917
>>>> https://www.openstreetmap.org/node/8454666256
>>>> 
>>>> The building footprints under them are only tagged "building=yes",
>>>> so
>>>> maybe it confused the software when trying to read for the "addr"
>>>> tags?
>>> 
>>> Huh, that is strange. There are a few of these in this area. There
>>> are lots of other points that got conflated with their buildings, so
>>> it seems to be particular to this area. I'll take a closer look
>>> tonight and see if I can figure out what happened. Thanks for finding
>>> this!
>>> 
>>> _______________________________________________
>>> Imports mailing list
>>> Imports at openstreetmap.org
>>> https://lists.openstreetmap.org/listinfo/imports
>> 
>> 
>> 
>> _______________________________________________
>> Imports-us mailing list
>> Imports-us at openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/imports-us
> 
> _______________________________________________
> Imports-us mailing list
> Imports-us at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/imports-us
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports/attachments/20210228/d73b39dc/attachment.htm>


More information about the Imports mailing list