[Talk-us] Proposed mechanical edit - New York building footprints

Kevin Kenny kevin.b.kenny at gmail.com
Sat Mar 19 15:18:46 UTC 2022


On Sat, Mar 19, 2022 at 9:51 AM Greg Troxel <gdt at lexort.com> wrote:

>
> Kevin Kenny <kevin.b.kenny at gmail.com> writes:
>
> > If anyone can give me useful advice regarding managing an edit this big,
> > I'm all ears. I certainly don't want to manage 60,000 changes by
> accepting
> > them one at a time in JOSM!
>
> From your description, I think you are doing this right.
>
> When we (the MA osm community) imported building footprints from
> MassGIS, the basic process was to take that dataset (about 2M
> footprints) and to basically run a diff/overlap against OSM and produce
>
>   footprints that are in MassGIS and do not overlap anything in OSM.
>   This is the candidate import set.  It as big, as most of MA did not
>   have buildings imported befoer and most buildings had not been
>   hand-drawn.  I'll guess over 1.5M.
>
>   Footprints in MassGIS that overlap.  This is just something to look at
>   to see if hand editing is helpful.  We didn't really dig into it, and
>   we certainly didn't upload it.
>
> It's just code to make this, and it produces really large output.
> Obviously your rules for sorting and your code are going to be quite
> different, and I'd recommend having this all in postgis.
>

Thanks for the insights; this is quite helpful. And yeah, I'm using PostGIS
quite heavily.  If you looked at the GitHub page, you'll probably suspect
that the two maps of Suffolk County were produced by doing the data
reduction in PostGIS and the data presentation in QGIS - and you'd be
right. For those maps, the query that did the heavy lifting is at
https://github.com/kennykb/NYbuildings_repair/blob/main/analyze-changesets.tcl#L320
The conversion to UTM (EPSG:32618) from latitude/longitude was so that the
buffer would get applied with a constant scale.

A building footprint import is geometry-heavy, and so your workflow is
quite different from, and heavier-weight, than the relatively limited
repair that I'm proposing here. In my case, the buildings have been
imported already (with addresses). Separately, a statewide import of
address points was also performed. The latter import studiously avoided
changing address information that was already present.

The building footprints themselves are of quite low quality, but to me they
are what they are. I'm hoping that having discovered this mess doesn't
saddle me with the burden of leading the entire effort to fix the
footprints. Instead, I'm proposing just to apply a 95% or 99% fix to the
building _addresses_. The result won't be of nearly the quality that would
have been achieved had buildings been imported with coordinated review -
indeed, in that case, the Microsoft footprints wouldn't have been imported
directly at all, at least in the city centers, but merely served as a base
outside OSM for mappers to build on.  But at least a few tens of thousands
of building footprints of unknown quality won't have demonstrably incorrect
street addresses.

Since the geometry isn't going to change in the more limited process,
there's much less need for the full weight of .osc files, and indeed, I'm
thinking in terms of not producing them at all (except as a required part
of a mechanical edit review).  It turns out that the JOSM remote control
API has a function that is nearly ideal for this.  For a particular
discrete change (which would likely be a span like 'West Main Street in ZIP
code 13357," it's possible, indeed easy¸ to make a remote control URL that
causes JOSM to download the required ways, make the tagging changes, and
await further input. Essentially, the URL is a command: "download ways
816043005,816043002,816043003,816042998,...  and set addr:street="West Main
Street" on all of them."  That's a lot less unrelated information, and
consequently a lot less opportunity for the data to go stale.

I think that for producing changeset files, that's the process I'd follow.
Let JOSM retrieve the data and apply the tags, then save a changeset from
that.

Of course, I can recover the sets of ways presumed to be part of the
imports at any time, so if people do want to perform more review/revision,
there's a place to start.  The review/revision would be welcome - MS
building footprints are pretty bad - but I'm already half-way through
another months-long OSM project and I'm simply not expecting to have the
time to take this one on.  Consider the address fix simply a bandage for
the most obvious problem - which is that OSMand routing and navigation
doesn't work in half of NY state despite a good-quality import of E911
address points. Gresham's Law of data applies here - bad data drives out
good.

The big thing is really looking at the first chunk, and waiting a week
> after uploading it to make sure there aren't issues.  Let it flow into
> nomimatim, main render, osmand live, other places and have a look.
>

Yup.  Exactly the process that I followed with NY public lands on multiple
rounds (NYC watershed properties as an 'easy' first start of ~400 nature
reserves, then near-total reworks of crusty old imports of NYSDEC
properties and NYS parks&recreation&historic sites.  I went really slow and
careful on the first couple of rounds. Now when the state publishes an
update, I can do the job myself in a week or two because the process is
nearly automated: I get a JOSM session with data layers for the map (with
changed objects preselected) and for the new geometry (already tagged
alike). In the ideal case, I can pick the new stuff, copy-paste onto the
map, Ctrl+Shift+G, and Bob's your uncle. The crazy quilt of protected areas
that you see in the Adirondacks (and most of the protected areas elsewhere
in the state as well) are largely the result of that effort.

It also probably helps to have the code such that it can be rerun to
> produce a fresh candidate changeset, after some fixes.
>

Hence the Github.  The only part that I don't rerun is retrieving the
original changesets from OSM.  I'm close to incurring the wrath of the OWG
as it is!  Still, the scripts to do it are there; they simply check whether
they already have the data in local files and refrain from re-downloading
anything they already have.

(Then later you could do a different process, which basically diffs the
> external data against osm and produces output files of where they
> differ, for hand review and thinking about.  But that's not what you
> asked.)
>

It's maybe what I should have asked, but for NY street and address mapping,
Skyler already has that part of the problem solved, I think. His import of
the address points was carefully tiptoeing around a fair amount of existing
data.


> Hope this helps - what we did and what you are doing are different of
> course,
>

It surely does - if only to reassure me that I'm on the right track here.


-- 
73 de ke9tv/2, Kevin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-us/attachments/20220319/43cff079/attachment.htm>


More information about the Talk-us mailing list