[Imports] Ongoing Canadian building import needs to be stopped, possibly reverted

Nate Wessel bike756 at gmail.com
Fri Jan 18 00:08:54 UTC 2019


Hi everyone,
I've had a bit of time today to gather my thoughts on this import and I 
hope I can offer something more productive to the discussion now. First, 
I want to apologize to the importers for the panicked tone of my initial 
email and private communications. I saw after a long day that the 
buildings were literally one task away from completely swamping my own 
neighborhood, and I hope it's understandable that I felt pretty 
defensive about it, having put so much time into my own little corner of 
the city over the years. So, I want to thank you all for taking that in 
stride, and especially for agreeing to stop the import while we discuss 
the issues I raised. If I came off as harsh or unappreciative, please be 
sure that I didn't mean to. We're all volunteers here and I know 
first-hand how much work goes into doing something like this. I'm 
actually one of the lead mappers for a building import in my hometown at 
the moment - I'm not opposed in any way to imports of buildings if 
they're done right.

But I've also spent way too much time cleaning up bad import data - 
whether it's TIGER imports from way back when or more recently the 
disturbingly sloppy address ranges that showed up last year in Toronto. 
In my experience, it takes so much less time to get this right in the 
first pass than it does to clean up the damage months or years later 
when we realize some mistakes were made or the data could have been 
handled better.

There have been a lot of responses to some of the specific things I 
said, so instead of replying inline, let me try to rephrase the big 
issues as I see them with some of the new perspective and information in 
mind.

A ) This import, essentially, did not get approval from the imports 
list. While an email was sent, I think that it was so vague and 
misdirected (surely with no nefarious intent) that it would be hard or 
impossible for a casual subscriber to the list to understand the scope 
of the project. Without having understood the scope of the project, 
which is utterly huge, the import plan was not given adequate scrutiny. 
This is evidenced by the relative lack of discussion.

B ) I didn't know this was going on until I saw it happening. While my 
personal knowledge is obviously not a necessary precondition for 
successful imports, I do feel it may be a sign that the scale of this 
effort is wrong for the task at hand.
While the technical details and any processing of the data are probably 
best handled at the national level, since it all comes from the same 
source and presumably has the same technical hurdles to overcome, I 
can't imagine that the whole country can be asked whether it wants 
buildings to be imported or not, or what concerns and requirements would 
come attached to such an import. There will be so much local variation 
and I think that just has to happen at a more local level. If that local 
effort had been made, I'd be surprised if I never heard about it. Rather 
than attempt to notify all Canadian mappers, would it be too much to ask 
that this might go province by province or city by city? If I had seem 
'Toronto' or 'Ontario' anywhere on this mailing list, you can be sure my 
ears would have pricked up right quick.

C ) This import is going way too fast - there is simply no way three 
people could have carefully imported as much data as has been imported 
in the time since this started. Like I said, I'm working on an import 
myself and it's long, tedious, and strangely satisfying work when you're 
doing it carefully. In my opinion, these task squares are simply ten 
times too large at least. When I said above that my neighbrhood would be 
swamped by the next task, I really mean swamped. 90% of the places I go 
in Toronto fit inside a single task. The tasking manager we're using for 
the building import in Hamilton County allows one to upload custom task 
geometries. I got a bit silly with the task shapes perhaps 
(https://tasks.openstreetmap.us/project/107) but I think the size is 
about right - importing 500-1000 building footprints should take ~10-30 
minutes, with a careful check of the imagery, a check with JOSM's 
validation tool, a second validation after native OSM data has been 
merged with the import data... I would never attempt a task as large as 
the smallest task here, and I do not think that reflects poorly on my 
abilities or experience. If the tasking manager doesn't allow smaller 
tasks then it is the wrong tool for the job.

I have several specific technical issues with / questions about the data 
that are probably best addressed in some other forum, like on the wiki. 
If I may, I'd like to save those for the moment, because I think I see a 
productive way to keep moving forward with things while we discuss.

The data needs to be carefully and thoroughly validated at some point, 
right? May I suggest that everyone stop importing new data and engage 
themselves in cleaning and validating the data that has already been 
brought in, neighborhood by neighborhood? There is plenty to keep us all 
busy for weeks. While doing that, let's make a list of issues that we 
come across and discuss ways that they can be addressed before any new 
buildings are brought in. We can take this as a learning experience and 
make the rest of this import process better.

I have the feeling that some will feel this is redundant - wasn't the 
Ottawa import the test run? My response has to be that the data and the 
process are not yet as good as then can and should be, so another round 
of trials and iterative improvement is needed before this rolls out a 
mari usque ad mare.

With all due respect, patience, and humility,

Nate Wessel
Jack of all trades, Master of Geography, PhD candidate in Urban Planning
NateWessel.com <http://natewessel.com>

On 1/17/19 3:13 PM, OSM Volunteer stevea wrote:
> Thank you, John.
>
> On Jan 17, 2019, at 11:22 AM, john whelan <jwhelan0112 at gmail.com> wrote:
>> First if you look at the 2020 wiki page history you'll see there is a lot of input from Steve.  My concern with this very detailed input is it made it hard for a new person to quickly locate relevant information, an overview if you like.
> I encourage an "Overview" section or what some call a "Quick Start."  For some (experienced OSM mappers), this could suffice for "jumping in right now."  However, there is no shortcut for anybody involved in the importation of these data to read every single word of the wiki.  If wiki words aren't relevant, they either weren't in the right wiki or they could have and should have been deleted.  As I wasn't sure of the actual direction of the project, I added what I thought would help.  I would much rather have there be more (extraneous, even) guidance and instruction which later got deleted as superfluous than not enough and leave volunteers with more questions than answers.  Call this a failure to edit the wiki properly, though not on my part.
>
>> I will confess that there have been small groups in face to face meetings in small cafes where you need a password to logon to the internet.  He was not specifically invited to them all.
>>
>> I confess we have used conference calls and other methods of communication without notifying hundreds of people first.  There have even been meetings that I was unaware of.  For example I haven't even communicated directly with the mappers who are doing most of the import at the moment.
>>
>> There has even been at least one mapathon that Stats Canada only found out about after the event.
> I believe what is being said or conveyed here is that decentralized discussion preceding data input "happens."  Sure, it does, that is part of a planning process and not all of these are "widely open to all of OSM," nor should they be, nor must they be.  So, largely, "we agree" though I'm puzzled at your use of the verb "confess."  Largely speaking, it is the degree to which openness happens in OSM (or the spirit of moving it in that direction, especially when identified as "we need more here") which is important, not specific cases where openness didn't happen.
>
>> Personally I'm not convinced that OpenStreetMap really needs every building in the planet mapped in detail.
> I don't wish to change your mind, but as you point out later, others seem to disagree with you, seeing the urgency with which these data enter OSM.
>
>> The history was I was after the bus stops in Ottawa which meant I needed them with an open data license we could use.  I used to work at Stats Canada and the corporate culture is very different to OSM.
> Understandable and nothing wrong with that, especially as OSM does not seek to house our data with Stats Canada.  However, the reverse...we know the story.
>
>> In Canada we have fewer mappers on the ground and more places to map than in many parts of Europe.  We have a history of importing CANVEC data which comes from a number of sources including Municipalities.  So I acted in a coordinating role.  We managed to persuade the City of Ottawa to change it's open data license to align with the federal one.  I got my bus stops.  The local mappers were very much involved and there were at least half a dozen face to face meetings that took place.  I drifted down to one of them.
>>
>> Stats was very pleased with the added tags on the building outlines in Ottawa. This is information they felt could not be easily obtained in any other way.
> Informative and appreciated.  There are "pockets of uniqueness" all over the world and hence methodologies of "this is a good match here" for data entering OSM which will and do widely differ around the world.  However, I believe all can agree that "quality data are quality data" (as well as the opposite) and for this fundamental reason, OSM has standards to follow.
>
>> I am very aware that this data is important to many.  This includes Federal government departments and agencies.  They were very vocal at a meeting at Stats Canada during the HOT summit in Ottawa.  It was open and at least half a dozen OpenStreetMappers were present, three or four were from European or other out of town locations.  Having the building data in one place makes it much easier for the ed users than having to handle different formats and open data licenses.  Currently one municipal social agency is very interested in mapping places where fresh food can be obtained.  I forget some of the other interests but they were quite legitimate.  We have seen considerable interest by high schools and students in OpenStreetMap and using streetcomplete with building outlines is one way that they can add value without causing too much havoc.
> These are precisely the sort of reasons why OSM (with high quality, usable, local data) is so important.  Nobody disagrees with "high value data provide high value solutions" as an equation that many use.  The "front end" of that, how the data enter, is obviously key here.
>
>> After we imported Ottawa a group of mappers decided that we needed more buildings.  They organised mapathons with new mappers and mapped buildings with iD.  The results were not good and the data quality side was raised in talk-ca.  I was involved in one where I set up new mappers with JOSM and the buildings_tool plugin and that went much better as far as accuracy was concerned.
> Indeed, this is a typical "use case" in OSM:  a feedback loop says "not good results," so improvements to process hopefully assure the next iteration yield better data/results.  Congratulations on those successes, they are more of the good stuff of which OSM is made.  "The journey is the reward" is part of what's important in the process.  Although, good data as a result is important, too.
>
>> The result of these mapathons and the community reaction was to convince Stats Canada that releasing more building outlines as was done in Ottawa under an Open Data license was a way forward.  Kingston in particular was keen to release its building outlines and get them into OpenStreetMap.  Obtaining them and making them available was a Stats Canada decision and was made in their time frame.
> But, was it made within OSM's OWN tenets and timeframes?  That's a crucial consideration I continue to feel receives short-shrift (as you seem in the mood to "confess").
>
>> Given that Stats Canada released the data under an acceptable Open Data license I thought and still think the best way forward was to set up a plan and a process to import the data.  The alternative was probably going to be Ad-Hoc importing.
> I, too, think (and OSM knows) that the best way forward (with importable data) is to set up a plan with process.  I thought we did so with the BC2020 "reboot."  Yet, it isn't working, or is only partially working with limited success (I'll look at that portion in the glass that partially fills it rather than calling it empty when it isn't).  So, yet again, let's do a mid-course (or perhaps early-course) correction and right the ship.  Really, we seem to largely agree!
>
>> I suspect that talk-ca is probably the most appropriate mailing list for this sort of discussion which is why I emailed Nate directly.
> We can move this to talk-ca if you like, I'm OK with that.
>
> Thanks for continuing good dialog,
> SteveA
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports/attachments/20190117/32eb4e22/attachment-0001.html>


More information about the Imports mailing list