[Imports] Ongoing Canadian building import needs to be stopped, possibly reverted

OSM Volunteer stevea steveaOSM at softworkers.com
Thu Jan 17 17:47:23 UTC 2019

Yaro Shkvorets <shkvorets at gmail.com> wrote
> 1) There was a discussion going on in the import list (starting with Ottawa
> import a year ago) and in the slack channel. If you have any concerns let's
> talk.

I know nothing about slack channel discussion (a proprietary communication methodology I will not use in an open data project), but there has been discussion (in talk-ca, in wiki...) far longer than that.  I noticed problems with this "import" (approach at having problematic Canadian building data) as data began to enter OSM:  it was rife with legal, technical and quality problems in 2017.  On talk-ca, in many personal emails and with a genuine attempt at politeness, a helpful attitude, reminders of "here's the way things must be done in OSM, nobody gets a pass on following our guidelines and rules," I have poured countless hours into (positively) critiquing the process, substantially improving the wiki that was written for it ("BC2020" project), all while attempting to correct a national-level (STATSCAN, the federal Canadian Bureau of Statistics) "invisible hand" of direction, which piloted this aircraft directly into a suicide tailspin.

My main complaint was that OSM was being used as a "dumping ground" repository for poor-quality data at the same time OSM's (severely resource-constrained) Legal Working Group was tasked with making sense out of a mess of non-compliant municipal so-called "open data" licenses from cities across Canada.  The problems were fairly massive, yet I wanted to offer helping hands, fully aware that I'm not Canadian, rather I'm an OSM volunteer who cares deeply about high-quality data entering our map with methodologies agreeable to our community, standards and tenets.  OSM has those, (some) Canadian OSM users intent on getting massive and problematic building data into our map simply didn't respect them:  "import away, anyway" was the charge-ahead cry two years ago, and even with course-correction and a reboot as wounds are apparently licked, appears to be the charge-ahead cry today.  What happened to "correct the data FIRST, then import them?"  Nothing, that's what.

A "reboot" of the project started a couple months ago and I largely stayed out of the way, hoping that the "lessons learned" (a principal at STATSCAN said "sometimes these things simply don't hatch right") would finally "stick" and improvements could be expected.

Alas, we have very mixed results, as while Yaro says he is able to successfully import many Tasking Manager "squares," we also have Danny McD's overlapping buildings that are "unacceptable" and this "has to be dealt with."  To wit, an essential admission that the process has broken down ("Task Manager instructions...clearly requests to validate and fix all building-related errors and warnings") has taken place.  This is a failure at the most basic level of an import, one which (many think) should be watched like a hawk at national and international levels, given the mess that happened during the first attempt to import.  Further, Yaro says both that he ALWAYS uses "Replace Geometry" on buildings (with meaningful tags) yet executing this for each pair of problematic buildings "would be insanely time-consuming."  Yes, Canada is importing millions of buildings:  what part of "that would be insanely time-consuming" surprises anybody?

Further, Yaro says:
> Thanks for flagging issues with the import. I'll ask guys to stop importing
> and address the raised concerns and resume only after everything has been
> dealt with.

Can it be assumed that Yaro is the "import lead technical coordinator" (that's loose, but works) for this endeavor?  Is anybody else "in charge" of this import?  (John Whelan was and possibly is a "guiding hand" in this endeavor, and I cc him as a courtesy, given our past interactions and his recent talk-ca posts on this).

This project remains a very messy process and obviously I want it to improve to meet it goals, but not at the expense of the concerns outlined by Nate:  "truly terrible data quality, the approach seems to be import first, validate second (if ever?), the wiki gives no clear description of the plan for integrating new data with old, the data (are) being integrated in extremely large chunks, (too large to be properly reviewed)," and more.  So, it appears that much or all that was wrong and true about this massive Canadian building Import in 2017 are still wrong and true in 2019.

Really, I don't know how to say this in other ways, other times, or do what I haven't already felt I could do to correct this, so I'm communicating my frustration at a process which appears wholly broken.  If ever there was a test case of OSM "saving itself from an existential crisis," getting this project on track (or pulling its plug) is it.  Good luck to us, OSM:  I'm simply the messenger at this point.


More information about the Imports mailing list