[Talk-ca] Some feedback on import quality in Toronto

Tim Elrick osm at elrick.de
Sun Feb 10 05:02:15 UTC 2019


Hi all,

After following the building import discussion for a while now, I wanted 
to chime in as well.

After moving to Montréal from Germany recently, I got more engaged with 
the local mappers here in MTL (beforehand, I was more analysing OSM data 
scientifically).

I took part in the initial meeting of the Building Canada 2020 
initiative, in which great interest in the project was expressed by many 
institutions, organizations and businesses. However, apart from 
Statistics Canada, municipalities and OSMappers no one seemed to be 
willing to invest into the effort to support the initiative with 
manpower or funding (to my knowledge). Therefore, I found it quite 
impressive what StatCan has achieved with the Open Building Database and 
do not share the view of some on this list that the initiative got off 
on the wrong foot; but that all water under the bridge now.

So, yes, there seems to be some interest to use the data from the Open 
Building Database in OSM easily. However, I am also hesitant, that one 
massive import can be the answer.

I'm generally hesitant with imports as such, maybe because I was 
acculturated in OSM in Germany where OSMappers value original entries 
much more than secondary data. Further, I'm skeptical, that secondary 
data is necessary better than original data (even from mapathons). I 
initiated two mapathons with university students in the context of 
Building Canada 2020. Both mapathons resulted in mostly nice buildings, 
I would say - and, when there is the odd not-so-nice building, there is 
still the validation step as we always used the tasking manager [1]. By 
the way, both mapathons used the ID editor; and, of course, you can 
square buildings in ID as well; so, I don't really understand the ID 
editor bashing that appears on this list here now and then. That said, 
of course, I prefer JOSM over ID as it is the more versatile tool, but 
to introduce interested persons to editing in OSM, ID is really nice.

I'm even more skeptical about imports after Yaro pointed us to the Texas 
import [2]. I wonder why there was no outcry there (or maybe there was 
and I did not hear about it) - the imported data is terrible: no 
parallel to street buildings, no right angles, sometimes even not the 
right size of building parts. Fact is that secondary data buildings 
footprints can be from many different data sources - from AutoCAD, 
handdrawn by a municipal GIS experts to photogrammetric and satellite 
machine learning sources; all those sources have their peculiarities, 
which I think, you cannot satisfy in one import plan fits all - 
especially, as the Open Building Database in Canada is stitched together 
from those very different sources.

In Montreal, e.g., the source for the Open Building Database is the 
données ouvertes des batiments. This is photogrammetric imagery probably 
turned into AutoCAD files, which then were exported to a shapefile and 
geojson. The building outlines are impressively precise, however, the 
open data files contain building blocks not single buildings [3], 
however, offer building dividers in a separate shapefile (I assume due 
to the export from AutoCAD, see second image in [3]). Unfortunately, the 
Open Building Database only included those building blocks in their data 
set, making it not very easy to import into OSM (as they do not include 
the building dividers). Hence, a bit of non-trivial pre-processing of 
the original données ouvertes des batiments would be necessary to import 
them into OSM (as the building divider file does also include roof 
extensions and roof shapes). The local OSM group is discussing this 
pre-processing for a while now at their local meetings (we started 
discussing this even before the Building Canada 2020 initiative 
started). As the City of Montreal has granted OSM the explicit use of 
their open data file, the way forward, we think, is to pre-process the 
original files. Further, there is extensive overlap of existing 
buildings with the open data file. Therefore, the imports in Montreal 
would have to happen in very small batches to not destroy the work of 
other OSMappers.

I am also pretty skeptical about the simplification of the secondary 
data before importing that was suggested on the list here. As the data 
sources of the Open Building Database are very diverse, one 
simplification method cannot fit all data sources and can lead to 
harming the ground-truth principle. This even happened when Nate tried 
to simplify buildings by hand in Toronto [4], as pointed out by Yaro. 
There might be the odd case, where secondary data has too many nodes in 
a straight line, but, usually, I would assume, that most data sources 
stem from GIS experts or machine learning algorithms; neither would 
include more nodes than necessary for a building outline. And honestly, 
I don't buy the argument of 'too much data clutters our planet dump'. 
Storage space and processing power is no longer an issue, and I would 
like to see the world as precisely represented as possible in OSM; in 
many parts of the OSM world you now find single trees, mailboxes and 
lamp posts in OSM; isn't that great? As for buildings, I would like to 
see all the bay windows, nooks and crannies - even in Canada.

How to proceed? For Montréal: After we looked more into the challenges 
of pre-processing the Montreal open dataset, I guess, we will propose a 
separate import plan. If anyone would like to join us in discussing the 
pre-processing, please contact me and we can continue on the Montréal 
OSM list. Oh, and by the way, while we all were discussing the import 
since December almost 3,000 buildings were mapped by hand in the Greater 
Montreal region [5].

That all being said, I do not want to stop anyone of you from importing 
buildings. I just think, that we have to do this more bit by bit to 
cater for all the peculiarities of the heterogeneous data sources of the 
Open Building Database.

Happy mapping to everyone,
Tim

[1] see e.g. http://tasks.osmcanada.ca/project/91
[2] https://www.openstreetmap.org/#map=19/32.97102/-96.78231
[3] https://imgur.com/a/S8Nq5rg
[4] https://i.imgur.com/H10360K.png
[5] http://overpass-turbo.eu/s/FWH

On 2019-02-03 18:35, Yaro Shkvorets wrote:
Having reviewed the changeset, here are my 2 cents. OsmCha link for 
reference: https://osmcha.mapbox.com/changesets/66881357/

1) IMO squaring is not needed in most of those cases.
- You can see difference between square and non-square ONLY at high zoom 
level. And even then, it's not visible to the naked eye. We are talking 
about inches here.
- Sometimes squaring is plain wrong to be applied here. Even though you 
paid very close attention you managed to square a couple of non-square 
buildings. Like this facade is not supposed to be square for example: 
https://i.imgur.com/H10360K.png I might be OK with squaring 
almost-square angles if there is a simple plugin for that. The way you 
propose to do it, by going building-by-building and pressing Q is 
completely unsustainable and sometimes makes things bad.
- Another thing, this particular neighbourhood is pretty dense and 
mature and therefore has mostly square buildings. I can only imagine how 
bad it would become if you ask people to square things in newer 
developments where buildings often come in irregular shapes.
- Like mentioned above, many successful import didn't require squaring. 
In this Texas one, 100% of buildings are not perfectly square: 
https://www.openstreetmap.org/#map=19/32.97102/-96.78231


2) Simplification is good to have, sure. Obviously standard Shift-Y in 
JOSM is a no-starter. If we can find a good way to simplify ways without 
losing original geometry and causing overlapping issues we should do it. 
But even then, reducing 500MB province extract to 499MB should not be a 
hill to die on.

3) Manually mapping all the sheds and garages is completely 
unsustainable. Having seen over the last couple of years how much real 
interest there is in doing actual work importing buildings in Canada 
(almost zero) adding this requirement will undoubtedly kill the project. 
Sure you will meticulously map your own neighbourhood, but who will map 
thousands of other places with the same attention to details? Also, you 
did rather poor job at classifying buildings you add, tagging them all 
with building=yes. Properly classifying secondary buildings like sheds 
and garages in a project like this is pretty important IMO. I agree with 
John, we should leave sheds to local mappers to trace manually.

To sum up, yes we can do better. But this is the perfect example when 
"better" is the enemy of "good".

On Sun, Feb 3, 2019 at 12:34 PM Nate Wessel <bike756 at gmail.com 
<mailto:bike756 at gmail.com>> wrote:

    Hi all,

    I had a chance this morning to work on cleaning up some of the
    already-imported data in Toronto. I wanted to be a little methodical
    about this, so I picked a single typical block near where I live.
    All the building data on this block came from the import and I did
    everything in one changeset:
    https://www.openstreetmap.org/changeset/66881357

    What I found was that:

    1) Every single building needed squaring

    2) Most buildings needed at least some simplification.

    3) 42 buildings were missing.

    I knew going in that the first two would be an issue, but what
    really surprised me was just how many sheds had not been imported.
    There are only 53 houses on the block, but 42
    sheds/garages/outbuildings, some of them quite large, and none of
    which had been mapped.

    I haven't seen the quality of the outbuildings in the source data,
    and maybe I would change my mind if I did, but I think if we're
    going to do this import properly, we're going to have to bring in
    the other half of the data. I had seen in the original import
    instructions that small buildings were being excluded - was there a
    reason for this?

    I also want to say: given how long it took me to clean up and
    properly remap this one block, I'll say again that the size of the
    import tasks is way, way, way too large. There is absolutely no way
    that someone could have carefully looked at and verified this data
    as it was going in. I just spent a half hour fixing up probably
    about one-hundredth of a task square.

    We can do better than this!

    -- 
    Nate Wessel
    Jack of all trades, Master of Geography, PhD candidate in Urban Planning
    NateWessel.com <http://natewessel.com>

    _______________________________________________
    Talk-ca mailing list
    Talk-ca at openstreetmap.org <mailto:Talk-ca at openstreetmap.org>
    https://lists.openstreetmap.org/listinfo/talk-ca



-- 
Best Regards,
           Yaro Shkvorets

_______________________________________________
Talk-ca mailing list
Talk-ca at openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-ca


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-ca/attachments/20190210/5136be0f/attachment-0001.html>


More information about the Talk-ca mailing list