[Talk-ca] Some feedback on import quality in Toronto

Tim Elrick osm at elrick.de
Sat Feb 16 17:22:02 UTC 2019


Hi John,

Thanks for pointing me to the license website. The open data of the City 
of Montreal is licensed CC-BY 4.0 and the City has explicitly granted 
OSM the right to use the data on top of that. See: 
http://donnees.ville.montreal.qc.ca/portail/licence/

StatsCan's Open Building Database uses exactly the same data source, 
however, as I pointed out in my last e-mail, it did not split the 
building blocks into actual buildings. The open data of the City of 
Montreal, furthermore, includes building heights which are lost in the 
OBD. These are the reasons why we would like to import the original open 
data.

Cheers,
Tim

On 2019-02-16 11:21, john whelan wrote:
When you look at importing Montreal you might like to look at the
following first.

https://wiki.osmfoundation.org/wiki/OGL_Canada_and_local_variants

Note if the Montreal data in available through Stats Can and the federal
government open data license it might be better to use that data source
from a licensing perspective.

Although data can be given to OpenStreetMap I don't think there in a
foolproof method of recording the fact.  If one person has the paper
record fine but if they are no longer part of the community then there
maybe a problem if the license is challenged.

Cheerio John

On Sun, 10 Feb 2019 at 00:04, Tim Elrick <osm at elrick.de
<mailto:osm at elrick.de>> wrote:

     Hi all,

     After following the building import discussion for a while now, I
     wanted to chime in as well.

     After moving to Montréal from Germany recently, I got more engaged
     with the local mappers here in MTL (beforehand, I was more analysing
     OSM data scientifically).

     I took part in the initial meeting of the Building Canada 2020
     initiative, in which great interest in the project was expressed by
     many institutions, organizations and businesses. However, apart from
     Statistics Canada, municipalities and OSMappers no one seemed to be
     willing to invest into the effort to support the initiative with
     manpower or funding (to my knowledge). Therefore, I found it quite
     impressive what StatCan has achieved with the Open Building Database
     and do not share the view of some on this list that the initiative
     got off on the wrong foot; but that all water under the bridge now.

     So, yes, there seems to be some interest to use the data from the
     Open Building Database in OSM easily. However, I am also hesitant,
     that one massive import can be the answer.

     I'm generally hesitant with imports as such, maybe because I was
     acculturated in OSM in Germany where OSMappers value original
     entries much more than secondary data. Further, I'm skeptical, that
     secondary data is necessary better than original data (even from
     mapathons). I initiated two mapathons with university students in
     the context of Building Canada 2020. Both mapathons resulted in
     mostly nice buildings, I would say - and, when there is the odd
     not-so-nice building, there is still the validation step as we
     always used the tasking manager [1]. By the way, both mapathons used
     the ID editor; and, of course, you can square buildings in ID as
     well; so, I don't really understand the ID editor bashing that
     appears on this list here now and then. That said, of course, I
     prefer JOSM over ID as it is the more versatile tool, but to
     introduce interested persons to editing in OSM, ID is really nice.

     I'm even more skeptical about imports after Yaro pointed us to the
     Texas import [2]. I wonder why there was no outcry there (or maybe
     there was and I did not hear about it) - the imported data is
     terrible: no parallel to street buildings, no right angles,
     sometimes even not the right size of building parts. Fact is that
     secondary data buildings footprints can be from many different data
     sources - from AutoCAD, handdrawn by a municipal GIS experts to
     photogrammetric and satellite machine learning sources; all those
     sources have their peculiarities, which I think, you cannot satisfy
     in one import plan fits all - especially, as the Open Building
     Database in Canada is stitched together from those very different
     sources.

     In Montreal, e.g., the source for the Open Building Database is the
     données ouvertes des batiments. This is photogrammetric imagery
     probably turned into AutoCAD files, which then were exported to a
     shapefile and geojson. The building outlines are impressively
     precise, however, the open data files contain building blocks not
     single buildings [3], however, offer building dividers in a separate
     shapefile (I assume due to the export from AutoCAD, see second image
     in [3]). Unfortunately, the Open Building Database only included
     those building blocks in their data set, making it not very easy to
     import into OSM (as they do not include the building dividers).
     Hence, a bit of non-trivial pre-processing of the original données
     ouvertes des batiments would be necessary to import them into OSM
     (as the building divider file does also include roof extensions and
     roof shapes). The local OSM group is discussing this pre-processing
     for a while now at their local meetings (we started discussing this
     even before the Building Canada 2020 initiative started). As the
     City of Montreal has granted OSM the explicit use of their open data
     file, the way forward, we think, is to pre-process the original
     files. Further, there is extensive overlap of existing buildings
     with the open data file. Therefore, the imports in Montreal would
     have to happen in very small batches to not destroy the work of
     other OSMappers.

     I am also pretty skeptical about the simplification of the secondary
     data before importing that was suggested on the list here. As the
     data sources of the Open Building Database are very diverse, one
     simplification method cannot fit all data sources and can lead to
     harming the ground-truth principle. This even happened when Nate
     tried to simplify buildings by hand in Toronto [4], as pointed out
     by Yaro. There might be the odd case, where secondary data has too
     many nodes in a straight line, but, usually, I would assume, that
     most data sources stem from GIS experts or machine learning
     algorithms; neither would include more nodes than necessary for a
     building outline. And honestly, I don't buy the argument of 'too
     much data clutters our planet dump'. Storage space and processing
     power is no longer an issue, and I would like to see the world as
     precisely represented as possible in OSM; in many parts of the OSM
     world you now find single trees, mailboxes and lamp posts in OSM;
     isn't that great? As for buildings, I would like to see all the bay
     windows, nooks and crannies - even in Canada.

     How to proceed? For Montréal: After we looked more into the
     challenges of pre-processing the Montreal open dataset, I guess, we
     will propose a separate import plan. If anyone would like to join us
     in discussing the pre-processing, please contact me and we can
     continue on the Montréal OSM list. Oh, and by the way, while we all
     were discussing the import since December almost 3,000 buildings
     were mapped by hand in the Greater Montreal region [5].

     That all being said, I do not want to stop anyone of you from
     importing buildings. I just think, that we have to do this more bit
     by bit to cater for all the peculiarities of the heterogeneous data
     sources of the Open Building Database.

     Happy mapping to everyone,
     Tim

     [1] see e.g. http://tasks.osmcanada.ca/project/91
     [2] https://www.openstreetmap.org/#map=19/32.97102/-96.78231
     [3] https://imgur.com/a/S8Nq5rg
     [4] https://i.imgur.com/H10360K.png
     [5] http://overpass-turbo.eu/s/FWH

     On 2019-02-03 18:35, Yaro Shkvorets wrote:
     Having reviewed the changeset, here are my 2 cents. OsmCha link for
     reference: https://osmcha.mapbox.com/changesets/66881357/

     1) IMO squaring is not needed in most of those cases.
     - You can see difference between square and non-square ONLY at high
     zoom level. And even then, it's not visible to the naked eye. We are
     talking about inches here.
     - Sometimes squaring is plain wrong to be applied here. Even though
     you paid very close attention you managed to square a couple of
     non-square buildings. Like this facade is not supposed to be square
     for example: https://i.imgur.com/H10360K.png I might be OK with
     squaring almost-square angles if there is a simple plugin for that.
     The way you propose to do it, by going building-by-building and
     pressing Q is completely unsustainable and sometimes makes things bad.
     - Another thing, this particular neighbourhood is pretty dense and
     mature and therefore has mostly square buildings. I can only imagine
     how bad it would become if you ask people to square things in newer
     developments where buildings often come in irregular shapes.
     - Like mentioned above, many successful import didn't require
     squaring. In this Texas one, 100% of buildings are not perfectly
     square: https://www.openstreetmap.org/#map=19/32.97102/-96.78231


     2) Simplification is good to have, sure. Obviously standard Shift-Y
     in JOSM is a no-starter. If we can find a good way to simplify ways
     without losing original geometry and causing overlapping issues we
     should do it. But even then, reducing 500MB province extract to
     499MB should not be a hill to die on.

     3) Manually mapping all the sheds and garages is completely
     unsustainable. Having seen over the last couple of years how much
     real interest there is in doing actual work importing buildings in
     Canada (almost zero) adding this requirement will undoubtedly kill
     the project. Sure you will meticulously map your own neighbourhood,
     but who will map thousands of other places with the same attention
     to details? Also, you did rather poor job at classifying buildings
     you add, tagging them all with building=yes. Properly classifying
     secondary buildings like sheds and garages in a project like this is
     pretty important IMO. I agree with John, we should leave sheds to
     local mappers to trace manually.

     To sum up, yes we can do better. But this is the perfect example
     when "better" is the enemy of "good".

     On Sun, Feb 3, 2019 at 12:34 PM Nate Wessel <bike756 at gmail.com
     <mailto:bike756 at gmail.com>> wrote:

         Hi all,

         I had a chance this morning to work on cleaning up some of the
         already-imported data in Toronto. I wanted to be a little
         methodical about this, so I picked a single typical block near
         where I live. All the building data on this block came from the
         import and I did everything in one changeset:
         https://www.openstreetmap.org/changeset/66881357

         What I found was that:

         1) Every single building needed squaring

         2) Most buildings needed at least some simplification.

         3) 42 buildings were missing.

         I knew going in that the first two would be an issue, but what
         really surprised me was just how many sheds had not been
         imported. There are only 53 houses on the block, but 42
         sheds/garages/outbuildings, some of them quite large, and none
         of which had been mapped.

         I haven't seen the quality of the outbuildings in the source
         data, and maybe I would change my mind if I did, but I think if
         we're going to do this import properly, we're going to have to
         bring in the other half of the data. I had seen in the original
         import instructions that small buildings were being excluded -
         was there a reason for this?

         I also want to say: given how long it took me to clean up and
         properly remap this one block, I'll say again that the size of
         the import tasks is way, way, way too large. There is absolutely
         no way that someone could have carefully looked at and verified
         this data as it was going in. I just spent a half hour fixing up
         probably about one-hundredth of a task square.

         We can do better than this!

         --
         Nate Wessel
         Jack of all trades, Master of Geography, PhD candidate in Urban
         Planning
         NateWessel.com <http://natewessel.com>

         _______________________________________________
         Talk-ca mailing list
         Talk-ca at openstreetmap.org <mailto:Talk-ca at openstreetmap.org>
         https://lists.openstreetmap.org/listinfo/talk-ca



     --
     Best Regards,
                Yaro Shkvorets

     _______________________________________________
     Talk-ca mailing list
     Talk-ca at openstreetmap.org  <mailto:Talk-ca at openstreetmap.org>
     https://lists.openstreetmap.org/listinfo/talk-ca


     _______________________________________________
     Talk-ca mailing list
     Talk-ca at openstreetmap.org <mailto:Talk-ca at openstreetmap.org>
     https://lists.openstreetmap.org/listinfo/talk-ca




More information about the Talk-ca mailing list