[Talk-ca] GeoBase2osm & canvec2osm script making

Fri Feb 13 22:21:46 GMT 2009

On Fri, 2009-02-13 at 02:52 -0800, Sam Vekemans wrote:
> Thanks,
> Well, it might not help as much as i thought.
> Anyway, I'm wondering about weather or not your planning on having
>  another script that would be able to import the road names? .. . or
> you think it would be better off that they get manually entered in,
> from survey?
> 

If the data does not exist in any form that we can use today then the
only option is for a manual entry when someone does a physical survey.
Once it is available from GeoBase then we should use a script to enter
it.  It will be available from GeoBase but we do not know when at this
point.  GeoBase is in the process of consulting with other stakeholders
to determine a schedule for when the missing data, like street names and
address data, will be made available to GeoBase.

> 
> I guess that as long as the data shows the same tag reference, it
> doesn't really matter what method was used. Right? I think so. 
> 
> 
> As im going through the process for creating the Canvec2osm script, im
> wondering about the usefulness for the NID, with respect to weather or
> not buildings / railways... (all relevant CanVec), different than
> GeoBase, this source tag would need to be "Geogratis.ca CanVec Import
> 2009"
> 
> 
> So the big question:  is there really a point to adding the NID for
> this CanVec data?

If CanVac has a NID then it should be incorporated within any import.
The NID is a unique identifier, at least within GeoBase, that will allow
for the future updating and reference for all data pertaining to a
particular item, be it a roadway, a Geopolitical boundary or any other
data that comes with one.

If there is ever the hope of updating anything that comes from GeoBase,
and potentially CanVac, then we are going to have to preserve, or
incorporate into existing data, the NID.  It can go a long ways into not
attempting to duplicate importing data (if something already exists
within OSM with a particular NID there is no reason to try to re-import
the same thing) and to add new data for the item (name, address/address
range, other designations, etc) when any new data becomes available.

The NID is especially going to be useful with any kind of automated
imports and updates.  Canada is far too large and has far to much data
available to want to expect that we are going to import the majority of
the GeoBase and CanVac data manually.  We are going to have to use some
automation, although there is going to also have to be some human
intervention and clean up afterwards, and GeoBase has developed a tool
to allow for this to occur easily.  It is called the NID and it allows
for the unique identification of all data items within it.  Any
reference to these items for future updates, references, etc is done
through the NID.  If at all possible we should attempt to include and
preserve this NID.  And it is not as if the NID is going to take up
tremendous quantities of data space either.

> ... since we will be adding and enhancing the maps after the import,
> the imported data will become meshed and melded into to OSM Map.
> ... 

And since we are planning to incorporate future updates from
GeoBase/CanVac into OSM we are going to be constantly looking at the
data.  When one prime source of our data is going to refer to the NID in
all circumstances we are going to lose a prime reference tool by not
ensuring the NID is incorporated, and preserved, whenever possible.

> Ie.  I import all the buildings, then add in more buildings that
> weren't in the set.  There is No need to try to figure out what the
> NID is of it. .. as the update set would be used as a reference and
> imported the same way.  >  having both layers visible, then copying
> the features that need to be added to OSM, then add more tags to make
> the feature more relevant.

Here you are talking a manual import.  How many buildings can you import
in a day?  How many days a year, how many man hours per year, are you
able to dedicate to importing buildings?  How many man hours do you
think it will take to import all of the desired data from
GeoBase/CanVac/other sources?  Doing it manually, and it will take a
great deal longer if we have to trace every building and then also copy
the relevant data as well, is going to take forever.  We do not have the
man power to do it within our life times, especially since we are not
able to do it full time but only in our spare time.

If we are going to be successful with the import of the
GeoBase/CanVac/Whatever data we are going to have to automate it as much
as possible.  How many kilometers of roadways have you imported into OSM
since you joined the effort?  I am willing to bet that we have had more
data imported by either Michel Gilbert, Steve Singer, and John Peterson
with their automated imports within the last two months than almost
anyone doing manual imports.  We have to automate the data import as
much as possible, and ensure that it goes as smoothly as possible
without destroying user imported data that already exists.  Michel gave
us the geopolitical boundaries and Steve and John have started to import
the roadway data.  This has gone very well so far and we are only
starting to fine tune the process, we are still learning what we are
trying to do.  As we get better at it we are going to better results as
well.  And as we get better at it we are going to be able to speed up
the import.  So far we are testing out what we are doing, with very
tentative steps, in order to learn what works and where we need to
change the process.

For a single import, a one shot deal, the NID is immaterial and has no
value.  But we know that we are going to want to update the data within
OSM from GeoBase and to do that we are going to have to rely on their
NIDs.  A major issue is going to be how do we start to add the NID as a
separate attribute to the ways already within OSM.  Why compound the
issue by not including the NID from a source that already includes it
when we are importing the data?

> <open question: Is this issue discussed enough on the wiki??>
> 
> 
> Cheers,
> Sam
> 
> 
> On Fri, Feb 13, 2009 at 2:31 AM, John Peterson <jdp at ix.netcom.com>
> wrote:
>         I'm writing scripts in Ab Initio a proprietary dataflow
>         language.
>          
>         It has the ability to pull apart xml, manipulate it and put it
>         back together.
>          
>         So if I have two xml streams, I can join them by a common key,
>         and move attributes from one to another and then recreate a
>         file like the original but with the new attributes.
>          
>         I use shp2text to pull the id and the srcstate
>         (matched/alone/uknown) from the RoadMatcher results which are
>         in Shape file format (that my scripting language can't read)
>         and write scripts to merge them with the originals.
>          
>         I think this kind of thing can be done in python too -- and
>         when I'm happy with my scripts (they produce correct results
>         with less manual fixup) I'll make a stab at translating them.
>          
>         Not sure that this solves your problem though ....
>          
>         John Peterson
>         
>                 ----- Original Message ----- 
>                 From: Sam Vekemans 
>                 To: jdp at ix.netcom.com ; talk-ca at openstreetmap.org ;
>                 Ian Dees 
>                 Sent: Thursday, February 12, 2009 11:44 PM
>                 Subject: GeoBase2osm & canvec2osm script making
>                 
>                 
>                 Hi John, 
>                 Great work, :)
>                 
>                 
>                 So i'm working with Ian Dees, (who created) shp2osm
>                 
>                 and trying to figure out how to make this happen.
>                 
>                 
>                 The CanVec shape files are similar, in that it's the
>                 same process of converting, accept that there are no
>                 duplicated to contend with.
>                 
>                 
>                 My approach is this:
>                 
>                 
>                 I can now convert the Shape file to OSM, i then have
>                 both the newly created OSM file, and the
>                 CurrentData.osm. ... i then, just copy and past the
>                 data to the osm layer.
>                 So the purpose is just to create an OSM file from the
>                 CanVec.shp file. 
>                 
>                 
>                 What i still need to learn todo is automatically apply
>                 tags to this created OSM file.  any idea?
>                 
>                 
>                 CanVec uses numbers instead of words like GeoBase
>                 does, so my script would have the # = osm tag.
>                 
>                 
>                 
>                 
>                 re:
>                 9) shp2text to extract the columns I need in xml form
>                 10) homebrew scripts to create standalone, matched,
>                 and unknown files from
>                 the geobase original
>                 11) bulk_upload.pl to upload the new alone sections
> 
>