[Talk-ca] Quick update

Richard Degelder rtdegelder at gmail.com
Tue Dec 30 15:55:44 GMT 2008


On Mon, 2008-12-29 at 23:10 -0800, Sam Vekemans wrote:
> Hi all,
> Dale made some great points.
> (and so i add my own ideas)
> 1 there are post IMG files that add up to 50gigs.
> (when the topo contours are removed, it would be a bit less.)
> If its spare computing power u need, my bets is that a college or
> university would be able to donate a powerful computer, right?
> 
> 2 what is the goal of the project?
> We touched on it before, but DO need to dig further, as we have more
> facts available on one side.
> 
> IF our goal is A
> or B, we have a different approach.
> A- to create the most accurate map of canada.
> A1- we can never be uptodate with government data, as it is an ever
> changing master database, with many users looking for different info.

But we can, if we design it right and want to, keep almost current with
the GeoBase data by applying updates as soon as they are released.  For
all practical purposes this wold mean that we are current with the
published data coming out of GeoBase in the areas we want.  In fact we
are always going to be ahead of GeoBase in some areas as well.  It is
easily possible that users have already entered data into OSM that
GeoBase does not yet have, meaning that GeoBase is behind OSM in forming
a complete map.

> A2 -the best we can achieve is to create a 'geobase/geogratis update
> import program'

In many ways this is really not an option.  Unless we either wipe out
all of the data already in OSM and replace it with GeoBase or we do not
import GeoBase where there is OSM data the effort to update OSM is
really not significantly different from the initial import.

> A3 all imported data needs to have an NID, for the sole purpose of updates.

The NIDs are only going to come from GeoBase, and are really only useful
within a GeoBase context, and so are only going to become available with
a GeoBase import or update.  If we decide that the GeoBase import is
going to be a one time event, as was things like the TIGER import, then
NIDs are irrelevant, either within the initial import or in the future.

> A4 for those areas in canada that were 'missed' during the last
> geobase shapefile compiling, it will show up in a year from now.

Or whenever GeoBase does an update.  And that is assuming that the
relevant data is available to GeoBase at the time.  What if Hamilton ON
is slow to send in an update to Ontario, who in turn is a little
overloaded at the time and is not as quick to compile data as it could
be.  And when the new data, which might represent a road that is already
being used for a year by then, arrives just to late to be included with
the next Ontario update.  That data, already a year old, is going to
have to wait until the next Ontario update.  It is easily possible that
the road was added by a OSM user shortly after it was officially opened
and so within OSM two years before GeoBase knows about its existence.
It will be waiting for a NID through at least one update from GeoBase.
> 
> B if our goal is to create the best 'osm user' compiled map, we have;
> B1 the constant problem of trying to manually merge the new data
> available, as the NIDs would be out of sync

Ways, and that can represent roads or anything else, are either going to
have an NID or not.  The update process is not going to vary much from
the initial import, if we are going to use NIDS, in that we are going to
look for ways without an NID and, if one is available, assign the
appropriate GeoBase NID to that way.

> b2 the ability that only roads with no 'extra tags & relations' be
> removed, BUT We keep a back up of the old data for reference

>From where?  The GeoBase update or from the OSM map?  If the goal is the
"best 'OSM user' compiled map" then there is no reason to remove
anything from OSM since it is the result of OSM users.  And all of the
GeoBase data come loaded with a lot of tags and so nothing is going to
be removed from it.  And with the "best 'OSM user' compiled map" then
there is no reason to look to GeoBase since it is not an OSM user.

> b3 we know that we CANNOT be upto date with geobase;

If we are good with our update planning and our scripts work we can have
any GeoBase update incorporated within OSM within a month of the GeoBase
update being published.  At the same time we are going to already have
ays within OSM that are not available form the GeoBase update yet
either.  GeoBase is also never going to even be up to date with its
sources.  There is always going to be a lag between an event and when it
is recorded.  It is quite possible that we will have some data
incorporated within OSM before the provincial updates and there will be
times when the GeoBase update will add new ways.

> i propose that instead of just letting the train go by and picking
> what we want; that we dump all of the trains contents, spilling over
> mapped areas.

GeoBase is not a train.  Consider it as a grocery store that is open 24
hours a day every day of the year.  You go shopping and pick up some
items, lets say ingredients for a cake.  When you get home you realize
that you do not have enough flour.  You can go back and pick up the
flour and have everything you need.  Later when you are home you decide
you want some potato chips but have none at home.  It is just another
trip to the grocery store away.  While there you notice that there are
some new items that they never carried before.  Then you have the choice
of picking them up as well, waiting to later to pick them up, or even
ignoring the new items.  Of course the grocery store is constantly
refreshing their stock so you can always pick up the freshest milk and
fruits and vegetables.

> Then worry about the update program later.

Deciding to just upload the GeoBase data that we initially want and
ignoring how we might want to update it in the future means that the
GeoBase data is indeed going to be a one time event.  We do have the
option of considering if it will be a one time event now, and precluding
any future updates or enhancements coming form GeoBase, or building in
the ability to update and enhance it as updates become available or as
the desire to add more data is seen as valuable.  To just do a data dump
of the GeoBase data into OSM means that we are going to have to start
all over as soon as we decide e want to add a new feature or do any
updates.  Worrying about the process to update the OSM map with new or
updated data from GeoBase now means that we are going to save a great
deal of work later or to even make it possible later.

As the quantity of data within OSM increases the complexity of using a
large data dump increases and eventually the prospect of doing such a
data dump becomes less desirable, possibly even counter productive.  At
some point in time the work required to start a new import of a large
data file becomes to great considering the amount of data that we would
gain.  

> 'cause by the time a new update is available, we would already have
> all the features for the cyclemap/hikingmap/wikitravel map.

But OSM is more than just the cyclemap/hikingmap/wikitravel map.  It
incorporates a great deal more and looking to just one set of interests
is counterproductive, especially when we are looking for a large
community involvement.  And looking to do an update of only a select
number of features, at the expense of the entire map, is sure to
alienate many of those that are willing to work for the advancement of
the whole.  And what happens if you, or anyone else for that matter,
decides that they are interested in something else?  Do we have to start
the whole process all over again?  And using the idea of the GeoBase
import becoming a large data dump, and replacing everything that is
there already, what happens to those features that are not covered by
GeoBase?  Does GeoBase have hiking trails?  Why should we then keep
hiking trails within OSM when you are proposing that the entire road
network be replaced by GeoBase derived data?  Why should hiking trails
and cycle paths be any more important than the roadways others have
entered?

> So just merging of the 'richer' osm data would be needed.
> I dont see that as a problem if we can agree that;
> if we can add the 'geobase2osm:fixme' tag
> 
> cheers,
> sam
> 
> ps, u had other questions that i'll get to.
> 
> Happy heated discussion day :)

Richard Degelder
rtdg





More information about the Talk-ca mailing list