[Talk-ca] GeoBase and OpenStreetMap

Dan Putler dan.putler at sauder.ubc.ca
Wed Dec 17 08:26:49 GMT 2008


Hi Dave,

The time and effort that you spent developing the tool to import the
TIGER data had a huge impact in terms of jump-starting the OSM efforts
in the US, which also needs to be jump-started in Canada for a large
number of areas. Having said this, there still remain differences
between the Canadian and the US contexts, and I think they do matter.
Some of explained in detail below, but the main difference is that you
will likely find that there will be higher levels of involvement on the
part of government participants at various levels of government that
have contributed to the NRN than has been the case in the US (this is an
empirical question, but in the last 36 hours two people, who work at
different levels of government, and who have an involvement in the NRN
project have commented on this thread, representing half the people who
have).

> > There is one important difference between the Canada NRN and the US
> > TIGER data. Specifically, the locational accuracy for the NRN is much
> > better than is the case with TIGER. As a result, the need to undertake a
> > big effort editing ways to fix their locational accuracy isn't going to
> > be nearly as critical (put another way, what do you trust more,
> > someone's Garmin eTrex or the provincial highway department's Trimble
> > differential gps unit?).
> 
> From the OpenStreetmap perspective, I think of it this way: do you want
> to trust some government dude who drove down my street once and decided
> how it should look on a map?  Or, do you want to trust *me* who is on
> the street every day to decide how it should look on a map?

This is also an empirical question, and boils down to what matters more,
personal attention or better technology. I don't have an answer to this,
but it is a question that should be looked at. However, how the US
Census Bureau did things and how Natural Resources Canada did things in
terms of the level of effort placed into getting local governments
involved was very different. Canada, at the moment, has two different
government produced road network files. The Geobase NRN and the
Statistics Canada RNF. The StatsCan RNF data has a history that is in
some ways similar to the US TIGER data, both were created to facilitate
the collection of Census data, both have positional accuracy problems,
both got "some" local government involvement in their creation, and both
had to be done "now" at some point (using whatever data was available)
to support the collection of data for a particular Census. Both agencies
know their data has issues, it was created to solve the problem of how
to direct census enumerators, but both agencies found that others wanted
to use their data for purposes then what the agency had intended. In the
US, the Census Bureau embarked on the MAF/TIGER Accuracy Improvement
Project (which didn't involve local governments), while Statistics
Canada decided to join the Geobase NRN which was driven by another
agency (Natural Resources Canada) that does (did?) not face the pressure
of having to have some (any) road network available to fulfill their
core mission (i.e., collecting Census data). Instead, Natural Resources
Canada had the latitude (in terms of time and mandate) to develop the
needed relationships at lower levels of government that the US Census
Bureau and Statistics Canada simply didn't have time to develop. Whether
this stills holds might be a topic that can be argued given that
Statistics Canada is committed to moving from their RNF to the Geobase
NRN by the 2011 Census, which has to be exerting pressure on the
process. 

> > As a result, the potential loss of information
> > from "forking" the data is relatively more important for the Canada NRN
> > then the US TIGER data.
> 
> I'm not sure I understand your argument.  Are you saying that coming up
> with a plan to merge the OSM changes back into the NRN data is more
> important that it would be for the TIGER?

O.K. I didn't communicate my point well. The ability to move back and
forth between the source and OSM data is equally important for the NRN
and TIGER data (there is a lot the OSM community will add to the data).
My real point is that the need to address positional accuracy issues
(for the empirical reasons I alluded to so far, and will bolster next)
is lower for the NRN than was the case with the TIGER data.

> > In my opinion the current US situation is
> > unfortunate. As a data user you have the choice of one publicly
> > available road network that is very good with respect to locational
> > accuracy (OSM), and another that has much poorer locational accuracy but
> > has address range and local area identifier information (TIGER) which
> > allows it to be used in geocoding and certain types of routing
> > applications (although, its locational accuracy is a problem for this).
> 
> I actually think that locational accuracy is one of the smaller problems
> here.  Yes, some of the TIGER data are atrocious.  But, on a day-to-day
> basis, I'd be willing to be that the consumer's GPS and things like
> urban canyons cause more problems than TIGER inaccuracy. 
> 
> Are you saying that every single piece of data in the "GeoBase" data set
> has been verified with "the provincial highway department's Trimble
> differential gps unit"?  I'd certainly believe that a good bit of it is.
> *But* a good bit of the TIGER data came from the same place: some very
> precise state and local government surveys.  I believe the stuff created
> from aerial (not even satellite) maps tends to be the worst for
> locational accuracy.

The NRN has a field indicating the source of the data (in the shapefiles
it is ACQTECH). Based on this field, 94% of the road segments in BC come
from GPS readings, and 6% come from orthophoto images (ortho rectified
images from aircraft). In Alberta 90% are derived from GPS readings, and
just under 10% come from orthoimages (ortho rectified satellite images).
In Saskatchewan, 100% come from GPS readings. This is not to say that
everything comes from a Trimble differential unit. For Nova Scotia the
ACQTECH field indicates that only 7% comes from GPS readings, while 92%
comes from "Vector Data" (which is just as mysterious as it sounds), and
the remaining 1% comes from "other" sources (which are a complete
mystery). While there will be "problem area", they are very unlikely to
be as common for the NRN as they were for TIGER.

As an aside, close in, high resolution images, in Google and Yahoo maps
come from ortho rectified aerial imagery, not satellite imagery. To
quote the Wikipedia article on Google maps:

"Although Google uses the word "satellite", most of the high-resolution
imagery is aerial photography taken from airplanes rather than from
satellites."

The folks tracing ways using Yahoo imagery on OSM are really tracing
ortho rectified aerial imagery, not satellite imagery. You are right
that non-ortho rectified aerial imagery can be a real problem (and may
have been used in the creation of the original TIGER data given its
constraints), but this doesn't appear to be the case for the NRN.
> 
> > If the TIGER TILD's had been maintained on the OSM ways life would be a
> > lot easier for a lot of potential users of the OSM data.
> 
> The TLIDs were preserved for each and every way.  I've introduced
> changes into JOSM to even preserve them as data are modified.  Why would
> you think otherwise?

You did (and thank you), but that doesn't mean that the TLIDs have been
preserved by others. Below is part of an xml export I did two days ago
for two ways in the Sunnyvale-Mountain View area of Santa Clara county,
California:

  <way id="28485435" visible="true" timestamp="2008-11-28T19:00:14
+00:00" user="corevette">
    <nd ref="189707444"/>
A bunch more I've removed
    <nd ref="26029634"/>
    <tag k="highway" v="primary"/>
    <tag k="name" v="E El Camino Real"/>
    <tag k="history" v="Retrieved from v19"/>
    <tag k="created_by" v="Potlatch 0.10f"/>
    <tag k="oneway" v="yes"/>
  </way>
  <way id="28617871" visible="true" timestamp="2008-11-25T08:27:52
+00:00" user="adbrown">
    <nd ref="193200393"/>
A few I've removed.
    <nd ref="65490039"/>
    <tag k="tiger:tlid" v="122956515"/>
    <tag k="tiger:separated" v="no"/>
    <tag k="highway" v="primary"/>
    <tag k="tiger:county" v="Santa Clara, CA"/>
    <tag k="tiger:source" v="tiger_import_dch_v0.6_20070809"/>
    <tag k="tiger:name_direction_prefix" v="E"/>
    <tag k="created_by" v="Potlatch 0.10f"/>
    <tag k="tiger:name_base" v="Middlefield"/>
    <tag k="tiger:name_type" v="Rd"/>
    <tag k="name" v="E Middlefield Rd"/>
    <tag k="tiger:reviewed" v="no"/>
    <tag k="tiger:upload_uuid"
v="bulk_upload.pl-9f300d22-5de3-4867-bd5e-8c2a200c22ad"/>
    <tag k="tiger:cfcc" v="A41"/>
  </way>

One way has the TIGER TLID attribute, the other way does not (the person
doing the editing of the way missing the TLID probably didn't understand
the relevance of it, and didn't think it mattered). Moreover, in looking
through the data in this area, I determined that a _large_ number of
ways cover multiple block faces (so can't have a single TLID). In these
cases, the only way I can see to attach the TLIDs back on to them is to
import them into GRASS, clean and break the ways at intersections, and
then attempt to match them back to the original TIGER data using a
combination of street name, proximity, and bearing. It can likely be
done to an acceptable level of precision, but what a pain! If you have
the tools to deal with these issues already, and do so on a regular
basis, my concerns are misplaced. If not, then they are legitimate
concerns.

Dan

-- 
Dan Putler
Sauder School of Business
University of British Columbia





More information about the Talk-ca mailing list