[Talk-us] Prototype tool to conflate/import Microsoft buildings

Max Erickson maxerickson at gmail.com
Fri Mar 31 17:20:32 UTC 2017


There's several scripts for outputting some information, generating
modified osm data and extracting buildings that aren't present at all
in OSM.

There's more description in the readme:

https://github.com/maxerickson/osm_ms_buildings

Only tested against the Michigan data which has 8140 geometries,
concentrated in the Detroit area. Not sure how things will go with
larger datasets. A clipped out region of reasonable size should work
fine (the scripts complete in a few seconds for ~10,000 buildings on
my older laptop).

For Michigan/Detroit, the main takeaway is that a *lot* of the
buildings are already in OpenStreetMap.

This histogram is calculated using the largest single overlap for each
existing OpenStreetMap building (data at
https://drive.google.com/open?id=0BxwWB33rZeUVU2FVaEVGUk9sNFU ):

bin           count
0.0-0.1     7645
0.1-0.2         41
0.2-0.3         34
0.3-0.4         53
0.4-0.5         64
0.5-0.6        106
0.6-0.7        112
0.7-0.8        280
0.8-0.9        954
0.9-1.0     5133

The areas there are presently calculated stupidly, in WGS 84, but I
think the relative information should be fine.

So of the ~14422 buildings present in OSM, 7630 don't overlap the
Microsoft buildings at all and 5 or 6 thousand overlap quite a lot. Of
the visual review I've done, I'd say that the existing OSM buildings
tend to be more detailed and line up more closely with current Bing
Imagery.

Separately, there are 410 buildings in the Microsoft data that do not
exist in OSM (take a look
https://drive.google.com/open?id=0BxwWB33rZeUVNTVJVmNQcEg2RHM ).

A more sophisticated matching algorithm is probably a good idea, but
for the Detroit data I would be pretty comfortable mechanically adding
the heights for the buildings where the overlap is roughly 80% or
higher and then doing a more manual process for the new buildings
(checking against newer imagery?), and then also doing some sort of
more manual process to capture the information from the several
hundred buildings with smaller overlaps.


Max



More information about the Talk-us mailing list