[Talk-us] Proposed mechanical edit - New York building footprints

Kevin Kenny kevin.b.kenny at gmail.com
Sat Mar 19 03:21:36 UTC 2022


(See a more detailed project plan, including scripts developed and used so
far,
at https://github.com/kennykb/NYbuildings_repair)

In the last week or so, I've discovered a somewhat distressing fact about
building footprints in New York State. Well over 100,000 of them were
created by OSM users 'NYbuildings' and 'AlexCleary', both of which are
identified as pseudonyms for 'mileuthi'. The process appears to have been
that they were transcribed from Microsoft's AI-generated building
footprints, obtainable from
https://cugir.library.cornell.edu/?f%5Bcugir_category_sm%5D%5B%5D=structure,
together with the Street and Address Map data developed for the E911 system
and obtainable from
https://gis.ny.gov/gisdata/inventories/details.cfm?DSID=921.

Unfortunately, just under sixty thousand of them have errors in their
addresses. There are a handful of sporadic errors that appear to have
resulted from memory corruption (how else could a city name of '733e+001
WARR01094012915', '20170510JLevandowskiII', or 'ek' have arisen?), but the
vast majority appear to be accounted for by two systemic issues.

(1) The NYSGIS street address point database has three columns (street
address prefix, street address name, street address suffix) that make up a
street name.  About 31,000 affected buildings have been identified where
only the 'street name' portion was used, causing street names like
West Main Street' to appear as just 'Main'.

(2) The `addr:city` fields appear to identify administrative boundaries
that contain the buildings, rather than give postal cities. The erroneous
`addr:city` values include the names of communities without post offices,
whose postal cities are the names of neighbouring communities; buildings
near the border of a service area that are served from a post office in a
neighboring community; and even the names of census-determined places with
neither a post office nor a strong local identity.  About 32,500 buildings
- all in Suffolk County - have this problem.

Fortunately, a much cleaner import of the NYSGIS street address points by
Skyler Hawthorne makes it possible to identify thousands of questionable
addresses and improve on the current situation. His import was careful to
do the responsible thing and refrain from modifying any existing address
fields on a building (and was conducted after most of the damage discussed
here was done).  Even if it left an incorrect address in place, it still
tagged the building with the external identifier of the address point in
NYSGIS that appeared to match it.

In brief, what I'm proposing to do here is to structure an automated edit
to:

(1) identify buildings imported by either of the two users identified, that
are tagged with NYSGIS address point IDs. The list of users may expand,
because I don't know what other sock puppets the importer might have used.

(2) find the `addr:*` keys from the original import, the current state of
OSM, and the NYSGIS database.

(3) If an address field is unchanged since the original import and differs
from NYSGIS,
replace it.

I don't claim that the result of this process will be perfect, but the data
we have in OSM now is worse than useless in the affected areas.  A glance
at
http://overpass-turbo.eu/?Q=%5Bout%3Ajson%5D%5Btimeout%3A25%5D%3B%0A(%0A%20%20way%5B%22building%22%5D%5B%22addr%3Astreet%22%5D%5B%22addr%3Astreet%22!~%22%20%22%5D(%7B%7Bbbox%7D%7D)%3B%0A)%3B%0A%2F%2F%20print%20results%0Aout%20body%3B%0A%3E%3B%0Aout%20skel%20qt%3B&C=43.02217;-75.05583;15&R
shows the extent of the problem!

If anyone can give me useful advice regarding managing an edit this big,
I'm all ears. I certainly don't want to manage 60,000 changes by accepting
them one at a time in JOSM!
-- 
73 de ke9tv/2, Kevin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-us/attachments/20220318/04d12bcd/attachment.htm>


More information about the Talk-us mailing list