[Talk-us] Low-quality NHD imports
kevin.b.kenny+osm at gmail.com
Fri Oct 13 13:52:12 UTC 2017
On 10/13/2017 02:06 AM, Frederik Ramm wrote:
there's a LOT of NHD:* (and nhd:*) tags on OSM objects, see
- 1.9 million NHD:FCode, but also 188k "NHD:Permanent_" (note the
underscore), 10k "NHD:WBAreaComI", or 1.5m "NHD:Resolution" just to grab
I haven't researched who added them and when, but they would certainly
not clear the quality standards we have for imports today. Most of this
information can be properly modelled in usual OSM tags, and where it
cannot, it probably shouldn't be in OSM in the first place.
Is there any systematic (or even sporadic) effort of cleaning up these
old imports? Is there reason to believe that the neglect extends to more
than just the tags - do geometry and topology usually work well on
these, or are the funny tags a huge "this whole area hasn't had any love
in a long time" sign?
ON IRRELEVANT TAGGING:
I, at least, ordinarily do not make a specific effort to ferret out
irrelevant tags. For the most part, they're harmless to me. If some
random object on the map haapens to have 'zqx3:identifier=2718281828'
among its tags, the only real damage is the diffuse cost of shipping
the data around.
That said, you're quite right that such tags might indeed be a symptom
of a neglected import, or one that was originally done with processes
that wouldn't clear today's bar. Even that has only some bearing on
the data that are meaningful.
ON IMPORTING NHD:
In the specific case of NHD, data quality varies by region. As Dave
correctly notes, Alaska is uniformly atrocious. (There really are no
good mapping data for Alaska. The technical challenge of acquiring
high-quality data for much of the state simply is greater than the
perceived value of the data.)
Where I am, on the other had, NHD is actually quite good - in the
maps that I render, which are almost all in rural areas, I use it.
I most often use it in combination with OSM and with other data
sources (USFWS national wetland inventory, Adirondack Park Authority
wetland inventory, NYSDOT, ...) which give the rendered maps a
somewhat 'cubist' appearance, but I find that appearance helpful -
it's an indication of data variability, and gives me an idea how much
uncertainty to expect in the field.
The fact that NHD is often quite 'stale' does not bother me at all
locally. I live in a heavily glaciated area, and the cities have been
settled for quite a long time by US standards. Out in the countryside,
the streams run typically in deep ravines, disproportionate to the
size of the streams. They aren't moving anywhere. They most likely
haven't moved significantly since the Wisconsinan glaciation, 14000
In the valleys, the detailed course of the streams does shift a bit,
but in the cities and towns, the streams are engineered, and
elsewhere, the terrain tends to be beaver swamp, and the streams shift
with every move of the rodents or every major storm. I never expect
the track of a watercourse within a wetland to be accurate, on any
NHD's topology is audited before it is released, so it's at least
consistent (and likely correct).
It's certainly hypothetically possible to map the streams using
'hand-crafted' methods - and I have done so for a few, when I've
happened to follow them in wilderness travel. (I occasionally go
hiking off-trail.) But the OSM community is never going to be able to
do that for the great many watercourses that flow over my extremely
well-watered area. There simply is too much land inhabited by too few
people, most of whom are not well enough connected nor technologically
literate enough to become OSM mappers. (Seriously, in some of these
communities, there is no cell service and only a quarter of the houses
have any sort of network connectivity. It's effectively working with
Third World infrastructure.)
It's virtually impossible to map most of these watercourses as an
'armchair mapper.' Our 'old second growth' timber gives rise to
extraordinarily dense tree cover - denser than true 'old growth'
forest. Even some fairly major watercourses - major enough that I
wouldn't attempt to ford in springtime - are difficult or impossible
to see in aerials.
There has been an OSM project to map lakes and ponds in New York
State, starting from point features giving their names. I've preserved
these tracings in OSM, because I don't replace mappers' work with
imports, ever. Nevertheless, I find them to be uniformly worse than
NHD. They're usually quite rough, and in a great many of them, the
mappers treated mats of floating or emergent vegetation as the
shoreline, making shallow ponds much smaller than they are.
For all these reasons, NHD is what I have in my area. I've never done
a large-scale NHD import, and nobody else has done one around me. If
I need a stream for a rendered map, and don't want the 'cubist' data,
I sometimes import it as a single object from NHD. Where else will I
(That's pretty much my guideline on when importing is likely to add
value: I as a data consumer have an identified use for most or all of
what I'm bringing in, I have no ready way to acquire the information
by mapping on the ground, and the external data set appears to be of
good enough quality in the places that I have boots-on-the-ground
mapping, and clean enough topology, that I can import without too much
trouble. Nobody's reverted yet.)
ON TAG RETENTION:
When I import, I retain tags that are likely to be useful. Synthetic
tags like 'area' I remove. I do occasionally retain tags that have the
appearance of 'foreign keys' - but they are quite specific and I do
ask mappers please to leave them alone.
As an example, with the public land polygons that I've imported,
I retain single unique ID's. I do repeat imports of those data sets,
and I use the ID's in a semiautomated process for reconflation. (The
reimported data are all checked manually, and I respect the work of
mappers who've modified the import. The ID merely gives the script a
When importing single objects from NHD, I remove most of the rubbish
but I do keep 'permanent_identifier'. (The 'PERMANENT_' tag is an
artifact, coming from the fact that some intermediate database
somewhere in the pipeline is limited to ten-character column names.)
I also retain 'reachcode'. That string of digits is, according to
USGS, guaranteed to be stable - they don't reuse them - and encodes
information about the topology of a stream. I know some of the local
codes quite well - I recognize at a glance that codes beginning
with 02020005 refer to waterways that drain to the Atlantic by way
of Schoharie Creek (a locally significant river despite the name,
with dams, reservoirs, power stations), the Mohawk River, and
the Hudson River. It's effectively a machine-readable 'second name'
for the object. The other stuff, that doesn't map to OSM tagging,
Should irrelevant tagging on NHD objects be stripped? Maybe, although
the diffuse cost of the database space and network bandwidth to retain
and exchange it doesn't keep me awake at night. Please keep
'reachcode', though, I use that one!
Should the presence of the irrelevant tagging cause the underlying
objects to be removed? Please don't. The data aren't perfect, but we
don't live in an ideal world. NHD's data quality is variable, but
where it's good, it's very good, and even where it's bad, it's often
better than anything else we're ever going to get our hands on.
Are the data obsolete? Sure - but obsolete data about stable features
are almost as good as up-to-date data about the same stable features.
Would I import them wholesale today? Surely not. That's primarily
because of the controversy that would ensue. I'm no stranger to
controversy, but I respect the community consensus that large-scale
imports are a matter of last resort, and forgo them where there are
significanat technical counterarguments. (I ignore the arguments that
are based solely on contentions that "imports are always bad for the
community," or else I'd never import anything.)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Talk-us