[Imports] Ongoing Canadian building import needs to be stopped, possibly reverted
Nate Wessel
bike756 at gmail.com
Fri Jan 18 00:08:54 UTC 2019
Hi everyone,
I've had a bit of time today to gather my thoughts on this import and I
hope I can offer something more productive to the discussion now. First,
I want to apologize to the importers for the panicked tone of my initial
email and private communications. I saw after a long day that the
buildings were literally one task away from completely swamping my own
neighborhood, and I hope it's understandable that I felt pretty
defensive about it, having put so much time into my own little corner of
the city over the years. So, I want to thank you all for taking that in
stride, and especially for agreeing to stop the import while we discuss
the issues I raised. If I came off as harsh or unappreciative, please be
sure that I didn't mean to. We're all volunteers here and I know
first-hand how much work goes into doing something like this. I'm
actually one of the lead mappers for a building import in my hometown at
the moment - I'm not opposed in any way to imports of buildings if
they're done right.
But I've also spent way too much time cleaning up bad import data -
whether it's TIGER imports from way back when or more recently the
disturbingly sloppy address ranges that showed up last year in Toronto.
In my experience, it takes so much less time to get this right in the
first pass than it does to clean up the damage months or years later
when we realize some mistakes were made or the data could have been
handled better.
There have been a lot of responses to some of the specific things I
said, so instead of replying inline, let me try to rephrase the big
issues as I see them with some of the new perspective and information in
mind.
A ) This import, essentially, did not get approval from the imports
list. While an email was sent, I think that it was so vague and
misdirected (surely with no nefarious intent) that it would be hard or
impossible for a casual subscriber to the list to understand the scope
of the project. Without having understood the scope of the project,
which is utterly huge, the import plan was not given adequate scrutiny.
This is evidenced by the relative lack of discussion.
B ) I didn't know this was going on until I saw it happening. While my
personal knowledge is obviously not a necessary precondition for
successful imports, I do feel it may be a sign that the scale of this
effort is wrong for the task at hand.
While the technical details and any processing of the data are probably
best handled at the national level, since it all comes from the same
source and presumably has the same technical hurdles to overcome, I
can't imagine that the whole country can be asked whether it wants
buildings to be imported or not, or what concerns and requirements would
come attached to such an import. There will be so much local variation
and I think that just has to happen at a more local level. If that local
effort had been made, I'd be surprised if I never heard about it. Rather
than attempt to notify all Canadian mappers, would it be too much to ask
that this might go province by province or city by city? If I had seem
'Toronto' or 'Ontario' anywhere on this mailing list, you can be sure my
ears would have pricked up right quick.
C ) This import is going way too fast - there is simply no way three
people could have carefully imported as much data as has been imported
in the time since this started. Like I said, I'm working on an import
myself and it's long, tedious, and strangely satisfying work when you're
doing it carefully. In my opinion, these task squares are simply ten
times too large at least. When I said above that my neighbrhood would be
swamped by the next task, I really mean swamped. 90% of the places I go
in Toronto fit inside a single task. The tasking manager we're using for
the building import in Hamilton County allows one to upload custom task
geometries. I got a bit silly with the task shapes perhaps
(https://tasks.openstreetmap.us/project/107) but I think the size is
about right - importing 500-1000 building footprints should take ~10-30
minutes, with a careful check of the imagery, a check with JOSM's
validation tool, a second validation after native OSM data has been
merged with the import data... I would never attempt a task as large as
the smallest task here, and I do not think that reflects poorly on my
abilities or experience. If the tasking manager doesn't allow smaller
tasks then it is the wrong tool for the job.
I have several specific technical issues with / questions about the data
that are probably best addressed in some other forum, like on the wiki.
If I may, I'd like to save those for the moment, because I think I see a
productive way to keep moving forward with things while we discuss.
The data needs to be carefully and thoroughly validated at some point,
right? May I suggest that everyone stop importing new data and engage
themselves in cleaning and validating the data that has already been
brought in, neighborhood by neighborhood? There is plenty to keep us all
busy for weeks. While doing that, let's make a list of issues that we
come across and discuss ways that they can be addressed before any new
buildings are brought in. We can take this as a learning experience and
make the rest of this import process better.
I have the feeling that some will feel this is redundant - wasn't the
Ottawa import the test run? My response has to be that the data and the
process are not yet as good as then can and should be, so another round
of trials and iterative improvement is needed before this rolls out a
mari usque ad mare.
With all due respect, patience, and humility,
Nate Wessel
Jack of all trades, Master of Geography, PhD candidate in Urban Planning
NateWessel.com <http://natewessel.com>
On 1/17/19 3:13 PM, OSM Volunteer stevea wrote:
> Thank you, John.
>
> On Jan 17, 2019, at 11:22 AM, john whelan <jwhelan0112 at gmail.com> wrote:
>> First if you look at the 2020 wiki page history you'll see there is a lot of input from Steve. My concern with this very detailed input is it made it hard for a new person to quickly locate relevant information, an overview if you like.
> I encourage an "Overview" section or what some call a "Quick Start." For some (experienced OSM mappers), this could suffice for "jumping in right now." However, there is no shortcut for anybody involved in the importation of these data to read every single word of the wiki. If wiki words aren't relevant, they either weren't in the right wiki or they could have and should have been deleted. As I wasn't sure of the actual direction of the project, I added what I thought would help. I would much rather have there be more (extraneous, even) guidance and instruction which later got deleted as superfluous than not enough and leave volunteers with more questions than answers. Call this a failure to edit the wiki properly, though not on my part.
>
>> I will confess that there have been small groups in face to face meetings in small cafes where you need a password to logon to the internet. He was not specifically invited to them all.
>>
>> I confess we have used conference calls and other methods of communication without notifying hundreds of people first. There have even been meetings that I was unaware of. For example I haven't even communicated directly with the mappers who are doing most of the import at the moment.
>>
>> There has even been at least one mapathon that Stats Canada only found out about after the event.
> I believe what is being said or conveyed here is that decentralized discussion preceding data input "happens." Sure, it does, that is part of a planning process and not all of these are "widely open to all of OSM," nor should they be, nor must they be. So, largely, "we agree" though I'm puzzled at your use of the verb "confess." Largely speaking, it is the degree to which openness happens in OSM (or the spirit of moving it in that direction, especially when identified as "we need more here") which is important, not specific cases where openness didn't happen.
>
>> Personally I'm not convinced that OpenStreetMap really needs every building in the planet mapped in detail.
> I don't wish to change your mind, but as you point out later, others seem to disagree with you, seeing the urgency with which these data enter OSM.
>
>> The history was I was after the bus stops in Ottawa which meant I needed them with an open data license we could use. I used to work at Stats Canada and the corporate culture is very different to OSM.
> Understandable and nothing wrong with that, especially as OSM does not seek to house our data with Stats Canada. However, the reverse...we know the story.
>
>> In Canada we have fewer mappers on the ground and more places to map than in many parts of Europe. We have a history of importing CANVEC data which comes from a number of sources including Municipalities. So I acted in a coordinating role. We managed to persuade the City of Ottawa to change it's open data license to align with the federal one. I got my bus stops. The local mappers were very much involved and there were at least half a dozen face to face meetings that took place. I drifted down to one of them.
>>
>> Stats was very pleased with the added tags on the building outlines in Ottawa. This is information they felt could not be easily obtained in any other way.
> Informative and appreciated. There are "pockets of uniqueness" all over the world and hence methodologies of "this is a good match here" for data entering OSM which will and do widely differ around the world. However, I believe all can agree that "quality data are quality data" (as well as the opposite) and for this fundamental reason, OSM has standards to follow.
>
>> I am very aware that this data is important to many. This includes Federal government departments and agencies. They were very vocal at a meeting at Stats Canada during the HOT summit in Ottawa. It was open and at least half a dozen OpenStreetMappers were present, three or four were from European or other out of town locations. Having the building data in one place makes it much easier for the ed users than having to handle different formats and open data licenses. Currently one municipal social agency is very interested in mapping places where fresh food can be obtained. I forget some of the other interests but they were quite legitimate. We have seen considerable interest by high schools and students in OpenStreetMap and using streetcomplete with building outlines is one way that they can add value without causing too much havoc.
> These are precisely the sort of reasons why OSM (with high quality, usable, local data) is so important. Nobody disagrees with "high value data provide high value solutions" as an equation that many use. The "front end" of that, how the data enter, is obviously key here.
>
>> After we imported Ottawa a group of mappers decided that we needed more buildings. They organised mapathons with new mappers and mapped buildings with iD. The results were not good and the data quality side was raised in talk-ca. I was involved in one where I set up new mappers with JOSM and the buildings_tool plugin and that went much better as far as accuracy was concerned.
> Indeed, this is a typical "use case" in OSM: a feedback loop says "not good results," so improvements to process hopefully assure the next iteration yield better data/results. Congratulations on those successes, they are more of the good stuff of which OSM is made. "The journey is the reward" is part of what's important in the process. Although, good data as a result is important, too.
>
>> The result of these mapathons and the community reaction was to convince Stats Canada that releasing more building outlines as was done in Ottawa under an Open Data license was a way forward. Kingston in particular was keen to release its building outlines and get them into OpenStreetMap. Obtaining them and making them available was a Stats Canada decision and was made in their time frame.
> But, was it made within OSM's OWN tenets and timeframes? That's a crucial consideration I continue to feel receives short-shrift (as you seem in the mood to "confess").
>
>> Given that Stats Canada released the data under an acceptable Open Data license I thought and still think the best way forward was to set up a plan and a process to import the data. The alternative was probably going to be Ad-Hoc importing.
> I, too, think (and OSM knows) that the best way forward (with importable data) is to set up a plan with process. I thought we did so with the BC2020 "reboot." Yet, it isn't working, or is only partially working with limited success (I'll look at that portion in the glass that partially fills it rather than calling it empty when it isn't). So, yet again, let's do a mid-course (or perhaps early-course) correction and right the ship. Really, we seem to largely agree!
>
>> I suspect that talk-ca is probably the most appropriate mailing list for this sort of discussion which is why I emailed Nate directly.
> We can move this to talk-ca if you like, I'm OK with that.
>
> Thanks for continuing good dialog,
> SteveA
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports/attachments/20190117/32eb4e22/attachment-0001.html>
More information about the Imports
mailing list