[Imports] Bing Building Import

Greg Morgan dr.kludge.gm at gmail.com
Wed Jul 4 04:10:35 UTC 2018


On Tue, Jul 3, 2018 at 2:37 AM, Christoph Hormann <osm at imagico.de> wrote:

> On Monday 02 July 2018, Greg Morgan wrote:
> > I have started work on the Bing building import for Arizona us.  I
> > have started this page here
> > https://wiki.openstreetmap.org/wiki/Import/Catalogue/US/BingBuildings
> > for the import.  This wiki page may be used by other mappers in
> > different states.
> >
> > [...]
>
> First thanks for bringing this up early in the process - although this
> is too early obviously for an import review it is good to have a broad
> discussion early.
>
>
Christoph,

Based on another email that you sent, I now see that there are two
Microsoft efforts.
There is this older version here https://wiki.openstreetmap.org/wiki/
Microsoft_Building_Footprint_Data
There is the newer verrsion here
https://github.com/Microsoft/USBuildingFootprints
https://blogs.bing.com/maps/2018-06/microsoft-releases-125-million-building-footprints-in-the-us-as-open-data
The Bing Maps team has been applying these techniques as well with the goal
to increase the coverage of building footprints available for OpenStreetMap
<https://www.openstreetmap.org/>. As a result, today we are announcing that
we are releasing 124 Million building footprints in the United States to
the OpenStreetMap community.



> A few points i would like to comment on:
>
> * legal aspects:  Microsoft released the data under the ODbL but does
> not specify what data sources go into producing it (in particular
> training data!) and does not make any claims that the data is free of
> third party rights.  I would not be fine with importing data of unknow
> provenance and without a meaningful guarantee that it is free of third
> party rights.
>

Some of the answers are here.
https://github.com/Microsoft/USBuildingFootprints

Specifically that means sub meter airplane flown images in the metro
Phoenix area will be better than the satellite images used in the rural
area of Arizona.  As far as I have read, they are using the same Bing
imagery as what I used in JOSM.  I believe that the provenance is there own
data as noted here.

Training details

The training set consists of 5 million labeled images. Majority of the
satellite images cover diverse residential areas in US. For the sake of
good set representation, we have enriched the set with samples from various
areas covering mountains, glaciers, forests, deserts, beaches, coasts, etc.
Images in the set are of 256x256 pixel size with 1 ft/pixel resolution. The
training is done with CNTK toolkit using 32 GPUs.

Data Vintage

The vintage of the footprints depends on the vintage of the underlying
imagery. Because Bing Imagery is a composite of multiple sources it is
difficult to know the exact dates for individual pieces of data.



>
> * quality aspects:  In contrast to almost all other data sets where
> there is some quantitative specification of quality (either explicitly
> or implicitly due to the purpose the data set is created for) there is
> no indication of quality in what Microsoft has released beyond the
> vague and meaningless 'awsome quality' claims.  IMO this means that a
> proper import review would only be possible based on a thorough
> analysis of the quality of Microsoft's product that holds up to
> scientific scrutiny.
>

That is part of the process that I am undertaking now based on their
official answer.  I posted this sample earlier today
https://drive.google.com/open?id=1I7BPMKLgABk8ikUdEPFpl6zKgh9E-sDN .
Hans had a look at the data with me.  The file has too many nodes to import
based on the 10,000 node limit.  The footprints have around the same level
of detail as this subdivision entered by craft mapper Turtur, a German
mapper, based on his edits.
https://www.openstreetmap.org/#map=17/33.67387/-112.40286.  The bing
footprints look to be of a higher quality than the craft mapper
pezizomycotina. a US mapper from Pennsylvania.
https://www.openstreetmap.org/#map=16/33.6095/-111.9375  It looks like the
" CNTK we apply our Deep Neural Networks and the ResNet34 with RefineNet
up-sampling layers to detect building footprints from the Bing imagery."
https://github.com/Microsoft/CNTK has problems with black roof top
buildings or solar panels used as roof top covered parking spaces.  In my
early opinion, the foot prints are no better nor no worse than a craft
mapper,s drawing.  The craft mapper's skill level may play a large part in
the quality.  I sent a thank you to Turtur for the buildings.  I have yet
to do so to  pezizomycotina.  However, a nice square build is always a good
start. I keep think that a couple of Mapillary runs would allow me to
collect addresses and build on pezizomycotina's work.

https://github.com/Microsoft/USBuildingFootprints

How good is the data?

Our metrics show that in the vast majority of cases the quality is at least
as good as data hand digitized buildings in OpenStreetMap. It is not
perfect, particularly in dense urban areas but it is still awesome.


Regarding quality in general - you should not make the mistake of trying
> to assess quality by picking a few places and manually reviewing the
> data based on gut feeling - possibly with the same imagery used as
> reference as Microsoft used in data set generation.  What i
> called "analysis of the quality that holds up to scientific scrutiny"
> means picking a sufficiently large number of sample locations
> representative for the diverse geography of the US and doing a
> quantitative analysis based on reference data of known and high
> quality.
>

I have created my tool to generate potential import candidates.  I have not
had the chance to explore more of the data yet.  As your other post
provided an idea of starting with Montana, that will not be useful in my
case.  The rural Arizona area I posted is no different than Montana.  I've
lived in both places.  I will still create some other files to explore this
topic.  Zoom 13 tiles can be used to group many of the buildings.  You can
see the Arizona counts here
https://drive.google.com/open?id=1_ciQdkkC655xUqoKI_as4uJjQ6aVqAjfU5bddyk8Q1k
. In the cases where there are over 9,999 nodes, then a zoom 16 or 15
aggregate will work best.
https://drive.google.com/open?id=12oQ6NxpyDRrnMjfGnANVGFD3fOxd_reZ0iiZstymg4A


> Microsoft's process documentation contains a number of hints that
> indicate things can go wrong in the process in ways that are likely to
> produce significant errors of kinds that are very unlikely to happen in
> manual mapping.  Without having reliable data on how often these things
> do happen (and how this varies between different geographic settings)
> you would essentially be doing a blind import.
>

Depending on the craft mapper, hand drawn buildings can have the same
problems.  In one area of the data that was posted, two buildings that are
two buildings were drawn as one.  I've seen this same effect with craft
mappers too. Unless, you know that there is a gap, then it is easier to
draw the building as one.   As noted with the prior answer, this will not
be a blind import.  There will be need to evaluate over coverage between
OSM and the Bing.  Let me post this again,

Training details

The training set consists of 5 million labeled images. Majority of the
satellite images cover diverse residential areas in US. For the sake of
good set representation, we have enriched the set with samples from various
areas covering mountains, glaciers, forests, deserts, beaches, coasts, etc.
Images in the set are of 256x256 pixel size with 1 ft/pixel resolution. The
training is done with CNTK toolkit using 32 GPUs.

There are 9,646 potential tiles to import for Arizona based on zoom 13
tiles. That means some tiles only have one building to tiles that many
thousand buildings. Some of these smaller tiles can be aggregated to reduce
the number of changes to manage. The training item from their github page
shows that they are trying to improve the tool by testing different
geographic areas.

Regards,
Greg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports/attachments/20180703/c72b6623/attachment.html>


More information about the Imports mailing list