[Talk-ca] Cleaning up after the GeoBase import
Michael Barabanov
michael.barabanov at gmail.com
Sat Jun 13 03:48:21 BST 2009
There's an additional complication for adding UUIDs: different
topologies, for e.g. dual carriageways. I'm seeing a lot of cases where
OSM has a single way for a stret, but Geobase has two. So, assigning UUIDs
can not be a mechanical process. In fact, I'll argue that matching
Geobase UUIDs is similar to throwing out exising data for a way and
adding it in these cases, if not positionally, then topologically.
And intersections of dual carriageways with smaller roads make it even
more complicated - every little piece has a separate UUID.
Speaking of quality, while positionally OSM data can even be more
accurate (I've seen a couple places already), topological-correctness
wise it does not approach GeoBase data. Duplicated nodes, overlapping
ways, crossing ways, unattached ways are all too common. For
auto-routing maps, this is a huge problem.
Same applies to naming of streets (St. vs Street and such) and road classification.
It's definitely possible to get to the desired state (first-hand
OSM-mapping derived data combined with systematic, UUID-enabled, QA-assured base from
GeoBase) using the "preserve existing" approach, but really, we should
ask ourselves, what's easier and gets to this state sooner:
1) start fresh (streets/road-wise), enjoy correct topology and overall
consistency for the whole Canada from the start, and correct/add to out-of-date data from GeoBase.
2) do a lot of, and I mean a LOT of manual fixes that often amount to
the same -- see what I started with.
Perhaps it's too late to have this discussion (I'm pretty new to OSM, BTW),
but to me it seems very tempting to be able to have a really good map right now,
with very little effort, and incrementally fix it subsequently. I think this
also makes the bidirectional exchange with GeoBase a lot more realistic.
All this said, the areas I've imported all follow preserve-existing
approach (see user "mbiker"). I've also started to fix up topology, and
that's what prompted the above.
Michael.
On Fri, Jun 12, 2009 at 09:54:28PM -0400, Richard Degelder (rtdegelder at gmail.com) wrote:
> William Lachance wrote:
>
> Look at this from another angle: Should we split up all the existing OSM
> road data that people have put in to add in GeoBase UUID information?
>
> The simple answer is that at some point we are going to have to.
>
> If we want to add the attributes available from GeoBase, and to be able to
> update it from future GeoBase updates, then we are going to have to find a
> way to add the GeoBase UUID information and, to do so, split the ways into
> the came segments that GeoBase uses. If we do it manually, which will be a
> lot of work, or someone develops a script to do it we are going to have to
> do it to utilize the data available from GeoBase for the map. Not doing it
> means we cannot import the GeoBase UUID an cannot benefit from the other
> attributes available within the GeoBase data which will leave us with two
> classes of roadways, those that have all of the current data available and
> those that are going to require that users manually add any additional
> details in order for them to appear.
>
> At some point we are going to have to go through the entire data set within
> Canada in OSM and look to add the GeoBase UUID and any additional data from
> GeoBase. We will not need to give attribution to GeoBase for data submitted
> from users nor the method that the GeoBase data was recorded but we will
> have to insert the GeoBase UUID for those segments. From that point those
> ways originally created by users will be handled exactly the same as every
> other segment originating from GeoBase.
>
>
>
> Michael Barabanov suggested:
>
> Let me be a devil's advocate here for a while. The 2 alternatives
> that make more sense are
>
> - Delete the existing street data and start fresh with GeoBase, and always
> maintain the UUIDs properly. Given how many inconsistencies/topology
> errors there are in OSM data for Greater Vancouver, it may just be
> the right thing to do.
> Otherwise we'll be spending months joining exising OSM data to newly
> imported segments. Not a great way to spend resources.
>
> - Don't pay attention to preserving UUIDs/segments, and re-run
> RoadMatcher when we need to import more data.
>
> What we're doing right now is neither here nor there. No benefit of
> GeoBase/UUIDs for existing OSM data and confused renderers for the
> imported GeoBase ones.
>
> Michael.
>
> I would suggest that we do neither. Although the GeoBase data is generally
> very good, frequently excellent, there are times where it is wrong or less
> than ideal. This can be for a number of reasons, many not within the
> control of GeoBase either. Where we have had good people working on OSM,
> using GPS tracks as the basis of their work, we usually also get pretty good
> quality, maybe not to the exacting quality that GeoBase has the potential to
> get. When we started the import we agreed that we would preserve the user
> generated data as much as possible. One of the guidelines for mass imports
> also is to preserve user generated data. Where the user generated data is
> faulty or wrong then we are going to replace it, in the same way that we
> would if we were manually editing the data.
>
> One of the reason that the GeoBase data may not accurately reflect what is
> really there is because it is only updated periodically. Thus new roadways,
> changes to existing roadways, and other changes are only going to be
> reflected in the GeoBase data sporadically. It takes time for the original
> data to be transferred up from the municipal level, where much of it
> originates, through the provincial level and finally into GeoBase itself and
> allow GeoBase to release a new update for the province. We want to have
> users continue to map and add to the map to keep it really up to date.
>
> Some of the data within GeoBase is also pretty old and generated from poor
> quality sources, some simular to the Yahoo! satellite imagery that we also
> have access to. If we can get a local mapper to correct the data then it
> can be better than the original GeoBase data and improve the map. I will
> frequently take good locally generated data over that from GeoBase and so I
> would be very reluctant to support the idea of wiping out locally generated
> data in favour of that from GeoBase.
>
> If, on the other hand, we ignore the preservation of the UUIDs and segments
> and instead merge them then we are going to go through a great deal of work
> every time we want t add a new feature from GeoBase. Are we going to create
> the new GeoBase compatible data set on the OSM servers, and then merge the
> data again afterwards? The history of the areas we do that in is going to
> take up far more resources than keeping the current GeoBase UUID data as
> part of the map.
>
> Richard Degelder
> _______________________________________________
> Talk-ca mailing list
> Talk-ca at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/talk-ca
More information about the Talk-ca
mailing list