[Talk-us] Anyone ever talked about adding more Land Ownership data to OSM?
Kevin Kenny
kkenny2 at nycap.rr.com
Wed Jan 9 02:52:56 GMT 2013
On 01/08/2013 12:17 PM, derrick nehrenberg wrote:
> It seems like the majority consensus is that the US Land Ownership (or
> management data) doesn't belong in the OSM database. So, I guess I won't
> be adding it.
[... argument for why the USFS Managed Land cadastre belongs ...]
> If anyone has any further arguments for why US Land Ownership (or
> management) data isn't or is a good candidate for OSM crowdsourcing, I
> would definitely be very interested in hearing that.
Importing data such as this is the subject of recurring impassioned
arguments. I think there are some fundamental misunderstandings that
cause the arguments to shed more heat than light. Since, like Derrick,
I'm interested in hiking maps, I think I ought to share my experiences
with producing them. Essentially, I'm trying to sort out for New York
State the data sources to do something similar to what TopOSM
(www.toposm.org) does for Massachusetts. You can see a work in
progress - the southeast corner of New York - at
http://kbk.is-a-geek.net/catskills/test2.html .
OSM is not, nor ought it to be "one-stop shopping" for all the
geodata that go into a map.
The lack of a particular type of object in OSM in a given region
is usually the rationale for advocating for
an import: "My maps will all need this, and other people's maps will,
too - so it should be in OSM." But it's fundamentally flawed. A typical
map will be built from many data sources, not limited to OSM. In fact,
the map linked to above has over two dozen layers, and ten public
datasets (some of them represented by multiple shapefiles) went into
its construction. Nobody argues that topography belongs in OSM, for
instance.
In fact, some of the data sources in the map above partially duplicate
OSM. There are polygons in OSM for New York's state forest preserve -
but the map production actually filters those out. Why? Because (a)
there is a newer version of that file at NYSGIS, and (b) the import had
errors. Why not reimport? Mostly because the original letter granting
permission has been lost, and the license terms offered to the public
on the NYSGIS web site are inconsistent with ODBL. Moreover, there's no
point in reimporting. Nobody edits the forest boundaries in OSM. There
really isn't anything sensible that a mapper can do with them. You
can't see the boundaries on aerial photographs, and for the most part
getting to them in the field would involve off-trail hiking over
densely vegetated and extremely steep terrain. The preserve's boundary
is frequently unmarked or indicated only by faded and indistinct paint
blazes, and the only certain way to establish metes and bounds is to
locate the steel stakes that surveyors drove at corner points, usually
with a metal detector. Because of these difficulties, the state's
geodata can be generally regarded as the most authoritative available
source.
Similarly, I ignore all hydrography in OSM, because OSM's water
features in my part of the world are incomplete. Instead, I use
NHD as a data source. Where I am, it's quite complete and accurate.
(I understand that is not the case in some other places.) Hydrography
is another thing that's hard for local mappers to do; in many places,
rivers, streams and ponds are on private property and cannot be surveyed
on the ground. Shorelines are often indistinct in aerial photos.
Of course, there are other places where local mappers could improve
things significantly. In places where they can approach lake shores
and riverbanks on foot or by boat, they can generate authoritative data.
For this reason, I think of hydrography as a hybrid case, one where the
imported data would sometimes be sacrosanct and sometimes benefit
from crowdsourcing. I'll return to this point shortly.
I bring in topography (both contour line and hillshading) from NED,
wetland polygons from USFWS National Wetlands Inventory, forest
amenities (car parks, trail shelters, viewpoints, information kiosks,
boat launches, etc.) from files that a colleague obtained under
the Freedom of Information Law. The wetlands and elevation data, like
the public land polygons, are not something that local mappers are
going to change. Since they are omitted from many maps (used most often
for maps used in outdoor recreation), it's probably not worth importing
them - let mapmakers who want them add the appropriate layers. The
forest amenities could be imported - and in fact their placement in
the state-level geodata is poor enough that local mappers could
significantly improve on it. That's the third case - all the features
in that layer are easily spotted in the field, readily accessible, and
local mappers could place them all on the map.
The really complicated situation comes about with roads and trails - and
you'll notice that the map linked above has a problem with them. It has
a number of sources of data: a NYS Department of Environmental
Conservation (DEC) data set of roads and trails on DEC-managed lands;
a series of data sets (again obtained by a colleague under the
Freedom of Information Law) of GPS tracks from walking the trails in
state parks; a series of personal GPS tracks; and OSM itself. None
of these data sources has everything, and a great many objects are
duplicated among two or more of these. Moreover, all of these data
sources contain errors - for instance, the road running east
from the Devil's Acre shelter south of Hunter Mountain is a
rugged hiking trail and exists as a road only in the fevered imagination
of the Census Bureau workers who produced the TIGER files. There are
multiple alignments for several trails, and incorrect alignments for
more.
This is the situation that has given imports a bad name. There are
multiple sources of data, none of them perfect, and challenges in
conflating the multiple representations. I can see that the the
false road in TIGER aligns closely with a real trail in the
Roads and Trails file and with a track on my GPS, but with the
partial and inexact alignment, an automated conflation tool would
be hard put to identify them.
Even if I cleaned up the braided trails that appear on that map,
by deciding that one or another input is authoritative, I still
would have no way to record the fact in such a way that when the
government issues another version of one of the datasets, I'd be
able to update without going through the entire exercise again.
The OSM data model is weak in its ability to record this type
of decision: "A multiline in TIGER, unique ID 12345678,
named 'XYZ Road', indicated in TIGER to be at this location, has
been deleted intentionally by a mapper," or "A multiline
in TIGER, unique ID 12345678, has been conflated with another
multiline in NYS DEC Roads & Trails, unique ID 987654, and with
a linear feature, OSM ID 23456789." Without this information,
an attempt to update an import gives rise to either discarding
the hard work of mappers who fixed the previous import, or
introducing spurious conflicts and reintroducing bad data that
were intentionally deleted.
Boundary information has similar issues; consider the polygons
representing wards of a city, the boundary of the city itself,
the boundary of the county in which the city lies, the boundary
of the state and the boundary of the nation. All of these may
come from different sources and align imperfectly; yet the
intent is that all of them may share segments in the case of
a border town, where ward, city, county and state all end at
the national border. And the question of "which data set is
right" cannot be resolved by a local mapper - the borders
are invisible lines in the field.
I don't have a good answer for the issue of inconsistent data
sources, but it's an unavoidable feature of the real world.
I suspect we can learn a lot from how the open-source software
community handles distributed version-control systems; change
conflicts on open-source software projects are routine, and
there are good tools for merging inconsistent changes into
a consistent whole.
So, what's the takeaway?
Imports where OSM can take custody of the data and local mappers
can clean it up - This is the perfect case. The forest amenities
might be such a case: the imported file was obtained with difficulty,
so the import is likely not to be repeated, the points of
interest can be verified by local mappers, and OSM can own the
result. It duplicates very little data that is in OSM already -
and the duplication is easily detected by finding nodes within a
given radius of an imported point.
Imports where OSM mappers are unlikely ever to edit the data -
Should be done only if there is obvious value added by having the
data in OSM rather than in a separate layer. Mapmakers, generally
speaking, include many layers of data; including a few more because
an import was not done is no big deal. But at least these imports
are mostly harmless - it's easy to identify, update and delete the
imported objects. NHD (I've decided; I know that I was once a
proponent of importing it in bulk), NED, USFWS Wetlands Inventory,
NYS DEC Lands, all fall in this category.
Imports where both OSM and the originator are likely to update the
data - These are the problematic ones. Once we import data, we
own the responsibility to keep it up to date, and as I observed above,
we really don't have the tools to manage repeated merges from
heterogeneous data sources. Arguably, if at all possible, we should
let local mappers redo these - certainly that appears to be
the position of some of the Germans. But it's a daunting task, and
I know that I can be motivated more easily to fix some mistakes in
the map than I can to fill in a huge area of whitespace. (Apparently
others are different, if the simulations are valid.)
In this category, I'd still be tempted to import NYS DEC Roads and
Trails - assuming that the licensing issues can be negotiated - because
it would be filling in significant whitespace, and would introduce
only a handful of conflicts. Most of those are with the earlier
TIGER import, and in those, in the vast majority of cases, the TIGER
data are simply wrong - at the level of indicating roads that
never existed nor could have existed (going up sheer cliffs or
crossing chasms. And mappers could improve the data significantly.
But I don't want to do it yet, because at the present state of
development, it'll just make a bigger mess down the road. Just as
a possible reimport of TIGER is problematic enough to have had
a significant thread of discussion going for weeks, so this import
would generate similar problems in the future. Until the TIGER
importers have a better answer, I don't want to compound the woes.
So: for me, imports seem to be falling into two categories:
"don't do it" (where I can simply add layers to my maps),
and "don't do it yet." (where I'm likely to make trouble for
future mappers). Unfortunately, the sweet spot of "full steam
ahead" appears not really to exist.
This situation disappoints me, because I really want to get rid
of those braided trails. But I haven't had the time to explore
better approaches to conflation and change management.
Sorry that this message has been so long. To paraphrase Pascal,
I had not the time to compose a short one.
--
73 de ke9tv/2, Kevin
More information about the Talk-us
mailing list