[Talk-ca] Aylmer/Hull QC: CanVec import overwriting existing edits

john whelan jwhelan0112 at gmail.com
Sun Feb 20 20:55:16 GMT 2011


It's been interesting reading some of the comments on talk not just ca-talk.

As some of you know I used to work with big databases so I tend to take a
more enamel bucket approach than a test tube one.  I work practically only
in NCR area otherwise known as Gatineau/Ottawa.

As background to the deletion in question which went inadvertently wrong
I've been working on the Ottawa side of the river and being a programmer for
many years looked at the specs for OpenStreetMap and noted that that I could
modify the XML file that JOSM loads to update the OSM database.  Also there
was provision for more than one language.  Name and name:fr, I wrote a
program to take an English name and convert it into French based on a bylaw
from the City of Ottawa.  Street becomes rue etc.  One option to display is
Maperitive with a custom rule set such as the one available here:

https://docs.google.com/document/d/1WkJzx5NffRv0TIQgCFFGTQzyqbQ9XDphSLqcjuM8wGM/edit?hl=enone
option is to display the Ottawa street names in French.

Before running the software though I closely inspected the OSM database for
Ottawa.  I found a number of problems that seemed fairly major to me.  First
I found more than 100 roads simply had the wrong name.  It took a fair
amount of time to spot these and verify the names.

Second many junctions weren't junctions, there was simply no join.
http://keepright.ipax.at was invaluable for spotting these and I spent many
many hours fixing many of these.  If you print the map it doesn't matter but
if you try and rout on it it really does matter.

Third some roads were 100 meters from where they should be, the one that
really stood out seemed to be a satellite tracing where the road had been
taken to the wrong set of traffic lights.

Fourth the names were inconsistent which makes it awkward to do electronic
searches, do you enter "Slater St." or "Slater Street"?

Fifth a number of tag values were misspelled or had the wrong values,
possibly as minor as a capital letter, making the POI unrendered or
impossible to find.  Maperitive has an export tags function which lists
these in a CSV file making them easy to spot and then correct manually and I
have done a fair number of corrections on the database for NCR.

Sixth a number of POIs were in the database but not rendered by the default
rules on the web map, slipways for launching boats is an example, there are
at least 22 locations in the database but they aren't normally rendered.
Seventh a number of roads that were in CANVEC in the house addresses were
quite obviously not in the Ottawa.OSM database.

It's only recently that these have become available for Quebec.  It was
interesting to see Gatineau appeared to be more consistent.

We had a social meeting in Ottawa of local mappers and I said I was very
tempted to just drop in the roads from CANVEC since it would clean the road
name and junction data up nicely.  The conversation then moved on to data
caching.

I was working with another mapper when they suggested we replace the Ottawa
road data with CANVEC so we did.  Today in Ottawa routing works nicely, the
road names are accurate and the roads are fairly close to where they should
be.  The CANVEC road data has been cleaned up, new roads added in etc.  I
did note that many service roads were not in CANVEC and you won't be able to
replace the CANVEC v.6 with v7 without losing a lot of additional
information.  CANVEC or other major import has a drawback in that once you
start adding additional tags to the imported ways you can't reimport a new
version without losing data so you can only do it once.  For example
footways that are joined to the imported roads replace the roads and you
lose the footway junctions.  The US is finding this out currently.

Ottawa has lots of POIs and I think part of that is it is now very easy to
drop a POI onto the map in the right place because of the street numbers on
the roads.

In Quebec the French road names are in the name field not the name:fr field
which leads to the interesting problem if you view Ottawa OSM in French the
Quebec street names are missing.  So I thought to tidy things up by
programmatically copying the street name into the name:fr slot.

I imported the CANVEC house numbers and inspected, even in Aylmer there were
a number of streets that had street numbers but no roads in the map.

What I meant to do was replace the residential roads before programmatically
adding in the name:fr field.  What I actually did was something different
and that was done in error.  In trying to recover I discovered JOSM has an
undocumented system feature which means undoing ways that have been deleted
is not always possible.  I have a request in currently to delete the change
sets since that would appear to be the only recovery option.

My personal view, which I accept is not held by everyone, is to be useful
the roads on the map have to be reliable.  Once you have those you can add
value to the end user by dropping in POIs, footpaths, cycle paths extra tag
fields for speeds etc. etc.

I suspect that OpenStreetMap will split into two sections, one that is
import friendly and one that relies on manual input.  The challenges will be
on the import side to find good sources of data and to manage the process, I
suspect it will be a more restrictive environment, on the manual import side
will be data quality.  Data from hand held GPS devices is fairly good,
traced satellite data is more questionable.

Cheerio John

On 20 February 2011 13:42, James Ewen <ve6srv at gmail.com> wrote:

> On Sun, Feb 20, 2011 at 9:41 AM, Richard Weait <richard at weait.com> wrote:
>
> > I'm a advocate of "not importing."  I like to think that I have
> > tempered my default no-imports stance with a realistic compromise of
> > the "well-considered, carefully executed, limited scope import, that
> > might be a net benefit if everything goes perfectly".  That message
> > seems to get diluted when an enthusiastic contributor discovers an
> > interesting dataset, and an import script; all they seem to hear is
> > "Hey!  Imports!  Cool!  Watch me go!!!!1!"  I find that frustrating.
>
> The "not importing" would be a bad thing for OSM... truly gathering
> GPS traces and using them to map an area really is importing data.
> Even drawing from memory could be considered importing. Of course
> that's getting a bit silly.
>
> I think the more accurate wording Richard alludes to is "automatic
> blind importing".
>
> Any work that we do on the OSM project really needs to have a set of
> eyes that are connected to an intelligent brain go over the data to
> ensure the best decisions are being made. Whether the source of the
> information is local knowledge, personally collected GPS traces,
> non-copyright maps, or government source datasets, it needs to have
> someone look at what is being imported to the OSM database to ensure
> things are happening in the best interests of OSM as a whole.
>
> The road matcher script that was used to try and find existing roads,
> and exclude the duplicates worked fairly well to try and keep from
> causing some of the problems seen in Aylmer. I still find places in
> Alberta where duplicate roads exist. Usually the culprit is the fact
> that the first pass at creating roads for OSM were done by hand from
> low resolution imagery. The road matcher script didn't associate the
> existing road with the CanVec road, and the CanVec imported road was
> placed in the OSM database. It takes manual intervention to correct
> this issue.
>
> When using any source data, one has to do due diligence in ensuring
> that the information being imported into the database is the best
> quality data available. If I were to set my GPS up to capture a trace
> with one point every 30 seconds, and then blindly use that trace to
> replace a high quality version of a road that already exists in the
> OSM database, we'd probably hear the same complaints.
>
> The CanVec data is a huge source for data that is available for import
> into OSM, but that just means that we have a lot of data to verify as
> we import it into the OSM data.
>
> As Richard has mentioned, we have some powerful tools, we have huge
> volumes of data available, but using the tools to import the data in
> an ideal way is still an elusive goal. It takes some time and work to
> get what we want to happen the way we want it to happen.
>
> James
> VE6SRV
>
> _______________________________________________
> Talk-ca mailing list
> Talk-ca at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/talk-ca
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-ca/attachments/20110220/fd309234/attachment.html>


More information about the Talk-ca mailing list