[Talk-ca] What we dont know from GeoBase

Richard Degelder rtdegelder at gmail.com
Thu Jan 1 17:40:47 GMT 2009


On Thu, 2009-01-01 at 00:30 -0700, Dale Atkin wrote:
>  
> 
> From: samvekemans at gmail.com [mailto:samvekemans at gmail.com] On Behalf
> Of Sam Vekemans
> Sent: Wednesday, December 31, 2008 9:43 PM
> To: Richard Degelder; talk-ca at openstreetmap.org; Dale Atkin; Michel
> Gilbert; Mepham, Michael
> Subject: What we dont know from GeoBase
> 
> 
>  
> 
> Hi all, 
> 
> I'll try to separate the ideas into separate discussions :)
> 
> 
>  
> 
> 
> We don't know:
> 
> 
>  
> 
> 
> 1. Is GeoBase/GeoGratis going to make available the dataset so that it
> can be shown as Nodes ONLY?
> 
> 
>  
> 
> Define “Nodes ONLY”. With my understanding of ‘nodes’ its trivial
> (although IMO of limited usefulness) to back this out of the data that
> is already provided. 
> 
>  

GeoBase/GeoGratis will not provide us with their data in the form of
nodes.  If we want to use nodes we are going to have to take their data
and convert it into nodes ourselves.  They provide the data as is and it
is up to the users, in this case ourselves, to convert it into something
meaningful for our purposes.  They neither care about the outcome, at
least to some degree, nor the process to achieve it.

But we are not going to import GeoBase/GeoGratis as nodes.  OSM uses
ways, as does GeoBase although they have a different term for it, and
the attributes are assigned to the way.  Why should we not keep the
format that both the source, GeoBase, and the recipient, OSM, both use?
Converting the source into something different prior to the import and
then requiring that it again be converted into something that is useful,
and happens to be pretty close to the same format that the original data
was in, is a massive waste of time.

Importing everything as a set of nodes, especially when there is not an
obvious relationship or order between them, is going to be meaningless.
As a previous comment on this list that stated that connecting these
nodes in the proper order is a non-trivial task were the order is not
obvious.

Because the ways within OSM are going to have the attributes it makes
sense to work with ways from the beginning.  It is these ways that we
are going to be giving the NIDs and use the nodes as a means of defining
the ways.

Also importing the GeoBase data as a set of nodes would mean that the
import is meaningless on it own.  We are still going to have to connect
these nodes together in order for them to be useful.  Are you proposing
that we make such a massive make work project?  Earlier you eluded to a
billion "pylons" across Canada.  They are meaningless without a lot of
people spending a lot of time with editors connecting them together.
And that effort is not going to be made which will negate the value of
the import.

> 
> We need this because;
> 
> 
> 1.1 - thats how any updates can happen. ... it's like a bar-code for
> everything that is imported, so we know how to deal with it.
> 
> 

GeoBase does not do updates based on nodes but based on the OSM
equivalent of ways.  The relevant data, for GeoBase, is the NID.  As
long as we add the NID the the OSM attributes for the way we can
consider usig it for future updates.  
>  
> 
> ??????????????? Huh ??????????????? I must have missed something major
> here, or some major misunderstanding of something because I don’t see
> how nodes are going to help in the update process (at all). Rather
> they’ll make it more difficult, and just confuse the issue. 
> 
>  
> 
> 
> 1.2 For those areas of EXTREME osm coverage, is handy, because its
> easy to spot where new mapping work is needed.
> 
> 
>  
> 
> 
> 2. Re: Road Name/Numbers
> 
> 
> Is geobase going to have all of the road names and numbers available
> for all the provinces?  and when?
> 
> 
>  
> 
> This is contingent on deals being made with the various provincial
> authorities. Here is the current status:
> 
> http://www.geobase.ca/geobase/en/partners/index.html#nrn
> 
>  
> 
>  
> 
> 
> We need to know this because:
> 
> 
> 2.1 StatsCAN already has available, so it could be an option to grab
> the raw data directly from there?
> 
> 
>  
> 
> This is an *EXTERMELY* bad idea. Trust me. The positional accuracy of
> the StatsCan dataset is garbage, they even say so (although not in
> such extreme language) is one of their write-ups. One might be able to
> use it to manually transfer street names over (or a very clever
> programmer might be able to work out some AI to do it, but that
> wouldn’t be me), but anyone trying to use it for positional accuracy
> will be sadly disappointed. 
> 
>  
> 

Even if StatsCan had extremely accurate data the licensing precludes our
having access to it currently.  So looking at what they offer is a waste
of time.  It is, apparently, going to be incorporated within GeoBase
eventually, and at that time will become available to us, but until that
time it is off limits.  When it is incorporated within GeoBase it will
be related to their NIDs and so will be used to update it OSM if we
want.

>  
> 
> 2.1.1 but does statsCAN also contain Road numbers?
> 
> 
> http://www.statcan.gc.ca/bsolc/olc-cel/olc-cel?catno=92-500-XWE〈=eng
> 
> 
>  
> 
> 
> 2.2 For the issue of QUALITY data.  we know that it not possible to
> manually copy the road numbers FROM geobase data TO osm data. ...   
> 
> 
> but... manually copying all the other features FROM osm data TO
> geobase roads, is much easier.
> 
> 

You are implying that we erase the current OSM data, except in possibly
areas that are very highly developed within OSM, and replace it with
GeoBase data.  That runs counter to
http://wiki.openstreetmap.org/wiki/Automated_Edits/Code_of_Conduct and
is not the best solution in any case.  Even within highly developed
areas of the map there are going to be the odd street missed within OSM.
Ignoring those areas with an import is going to miss those streets as
well.  And wiping out data, even within poorly mapped areas of OSM, is
not going to mean that the GeoBase data import is going to have all of
the data that exists.

As for manually copying all of the data from one data set to another who
are you proposing will do that?  How many hours are you going to be able
to devote to doing that within the next year?  And how much of Canada do
you think that will cover?  It took three years for the current map of
Canada to develop to this point with increasing numbers of mappers.
With a wholesale replacement of the map how many of the current mappers
are going to continue to participate?  And how many new mappers are
going to come into the project to help clean up the data to replace
those that leave?
>  
> 
> 
> Why? Possible is a very strong word. I wouldn’t copy from geobase to
> OSM, as there is way too much data for that. I don’t know how much
> data is there to be copied from OSM to the Geobase set, but this might
> be more do-able from a simple volume perspective. 
> 
>  
> 
> 
> 3. Re: CanVec data
> 
> 
> Then why has this dataset been created? ... why doesnt all the data
> which is available ONLY on CanVec, just simply be merged into the
> GeoBase dataset?.. 
> 
> 
>  
> 
> Geobase Only contains very specific data. It has a different goal than
> the CanVec dataset. That being said, Geobase has done an excellent job
> of getting certain data (roads in particular) available on a very
> liberal open license, so it makes sense to pull the information from
> where its already available.  For our purposes (or at least for mine)
> it makes sense to pull roads from Geobase, rather than CanVec for the
> simple reason of not wanting to wait for an update to the CanVec
> dataset. 
> 
>  
> 
> 
> We need to know this because:
> 
> 
> 3.1 It makes it rather confusing when looking at the canvec data list,
> to see the chart "included in geoabase = yes" ... but then we dont
> know, which one is more accurate?  ... why even include it in the
> CanVec list?
> 
>  
> 
> My impression, is that neither is ‘more accurate’. Instead that the
> data is the same in both places.
> 
>  
> 
> 
> 3.2 It makes the referencing confusing, as it's a sub-reference which
> is need. 
> 
> 
> 3.3 Will the next version of CanVec data, only include CanVec data? ..
> and no GeoBase Data?
> 
> 
>  
> 
> Where do you get this idea from? I’ve seen no evidence of this. 
> 
> 
>  
> 
> 
> I think that should cover it so far, sorry if some questions have
> already been answered (sometimes it takes a little while for facts to
> settle in).  As we'd rather not be making decisions from assumptions.
> 
> 
>  
> 
> 
> Cheers,
> 
> 
> Sam
> 
>  
> 
> I’m sorry if some of the above has come off as a little abrupt, but
> I’m getting a little frustrated over here. I feel like I have a
> solution which should work, and should provide a better overall mapset
> for everyone, and provide means for updating with user data in a means
> that will be preserved across revisions (which I think was the main
> problem with wiping out the database and starting over with public
> datasources as a base), but I’ve not really gotten any feedback on it.
> No one has told me “well that won’t work because ‘x’. Or that isn’t in
> line with the OSM philosophy because ‘y’.  
> 
>  

Fortunately the philosophy behind OSM is very important.  It is the
whole reason for its existence and it is also where the value of it
comes from.  If we have no philosophy, or ignore it at every instance,
then OSM will cease to exist very quickly.

We are not able to import just anything because doing so will ensure
that we are going to infringe on someone else's copyright or license.
And if we are very free to abuse the copyright or license terms of
others then our own copyright or license cannot be enforced either.

In fact ignoring licensing issues or copyright issues is the quickest
way to cause OSM to disappear.  As soon as any copyright holder finds
their product was the source of anything within OSM without their
permission, hopefully in written form, they have the right to demand its
removal and could force OSM to be shut down permanently.  And licensing
is a fundamental part of the OSM philosophy.

Also importing the GoeBase data is going to take some time to do
correctly.  This is entirely a volunteer project with nobody being paid
to work on it.  Thus it is done by people that have the abilities to do
so on their free time.  We cannot hire more people to do it faster,
although we can certainly encourage more competent volunteers to
participate, and because it is a volunteer organization it also is
decentralized meaning we cannot tell people to do things that they are
unwilling to do.

The fact that we were given this valuable data source about the middle
of November means that we are going to also run into conflicts with
people's schedules over the Christmas and New Years holidays.  People
are busy with family events at this time of the year.  The fact that
Michel Gilbert was able to import the Geopolitical boundaries before the
new year was a real bonus.

I would rather have us go a little slower and get things done correctly
than do a massive set of imports and then be stuck trying to clean up
the mess.  Doing an import of only nodes may populate the country with
nodes available to the editors but it will not improve the rendered map
in the least.
> 
> Instead we just seem to be talking in circles getting nowhere. 
> 

Then do something like demonstrate a script that can import data.  We
are going to have to develop some of the tools we are going to be using
for the import.  In order to do that effectively we are going to have to
understand the problems and concerns of others.  That, unfortunately,
take time and expertise and a lot of talking.  As it stands you, and I
for that matter as well, want it to be done but neither of us has shown
how to do it yet.

>  
> 
> May I ask… Who here has the expertise to actually make something like
> this work? (Sam, I don’t know your background, can you actually *do*
> much of what you are proposing, and want the ‘go ahead’ from someone
> or are you trying to convince people who can do what you’re proposing
> that what you’re proposing is a good idea?)
> 
>  
> 
> Dale
> 
> 

Regard the GeoBase data as a gift.  We can use it, as Michel did for the
Geopolitical boundaries import, as we wish.  GeoBase is not going to
give us anything special, nor are they going to do anything special for
any other group wanting their data who is unable or unwilling to pay for
it, so it is entirely up to us to determine what we want and to do it
with the data that they provide.

Operating within a completely volunteer organization is different that
within a business.  It is impossible to set fixed deadlines because
nobody is ever entirely certain about how much time they can commit to
the project.  Volunteers come and go as their lives dictate.  For most
people their hobby, and this is really a hobby for even those involved
with cartography professionally, comes after a lot of other priorities
like family and work and other commitments and so they cannot devote as
much time as even they would like.

Until you are able, and willing, to pay people to work on the project
then we are all going to have to wait for people to get around to doing
things on their own time and schedule.

We are not going to import all of the data from GeoBase but we are going
to, eventually, import the most relevant data into OSM.  It will take
time and he progress will likely speed up a bit now that the holiday
season is over.

And even GeoBase does not have all of the data we want.  We may import
roads in most of the provinces that are not going to have names imported
along with them.  Are you going to check them all out to add the street
names to make the map more relevant and correct?  So even here we are
going to have to wait for someone else to provide the data.  Postal
codes and street addresses are other areas where the GeoBase data is
incomplete and a complete import of the GeoBase data will not make the
map of Canada as complete as it should be.

It will happen it its own good time.  To speed it up provide the tools
that are necessary and progress will be faster.

Richard Degelder





More information about the Talk-ca mailing list