[Imports] United States Poultry Import

Fri Apr 22 15:09:23 UTC 2022

TL;DR:  I disagree with Martin that TIGER impeded the growth of the OSM
community, because it doesn't fit with my personal experience. I agree with
him that imports are somewhat less valuable and phenomenally more difficult
than most first-time importers presume. With those caveats, imports still
have their place in the OSM ecosystem.

On Thu, Apr 21, 2022 at 5:00 AM Martin Koppenhoefer <dieterdreist at gmail.com>
wrote:

> this is an open question, we can't say for sure what would have happened
> "if", but the slow growth of the US community in the early years was
> attributed by some people to the debt created with the TIGER roads import,
> which basically left such a mess that it (eventually) made many people stay
> away. Generally, an active community of people is much more important than
> the data, because if nobody maintains the data it becomes stale and
> therefore unreliable and consequently useless. On the other hand, with a
> sufficient number of people on the ground we can recreate any data in short
> time. People tend to feel more responsible for data they have entered
> themselves, with respect to data they have taken from someone else. They
> feel that they "own" the data and feel compelled to care for it.
>

My personal experience:

I surely saw OSM before the TIGER import - and was not terribly motivated
to join.  The map for 100 km around me consisted of three motorways - that
was it; even the major rivers were absent. By contrast, there was so-so
government data available. I did use TIGER, but I also made use of things
like the digital rasters from my state's DOT, which were of a quality
comparable to USGS maps at the same scale. Until the 1990s, the US had
available, entirely free of copyright, a phenomenal series of large-scale
(1:24000) maps. The effort was defunded in the first Bush administration,
and the maps are now quite stale or of much lower data quality.

Given what was available to me then, I reasoned, "why should I bother with
OSM, there's nothing there!" and didn't join the project until much later.
My speculation is that there were communities outside the US that did not
have such things available. In the UK, for instance, I understand that
Ordnance Survey maps were expensive and subject to Crown copyright, making
a fully open map a much more valuable resource by comparison. Moreover, we
all know of places where accurate maps have historically been closely
guarded state secrets, so "map it yourself" was the only available option,
and a fraught one at that.

The TIGER import changed the picture for me.  Now, looking at OSM, I had
not-too-bad data - stuff that I could use, and where stuff was missing, I
could add it!  It was also "one-stop shopping"; no longer would I have to
worry about conflation every time I looked at doing another mapping
project, because things could get conflated on the way into OSM and stay
conflated.  Even then, because of how the data were imported, I was quite
scared about correcting TIGER's errors. All the 'TIGER:*' keys looked like
foreign database keys to me, and I was afraid I'd be breaking external
linkage if I did anything but change the base geometry of objects. Of
course, now I know better, and feel free to delete all that mess whenever
I'm editing an TIGER-imported object.

Anyone who knows me on the project knows that I grumble as much as anyone
about "cleaning the cat box" from all the stuff that TIGER left behind -
but I'm an old man and like to grumble.  If the TIGER import hadn't been
done, along with numerous others, I think I'd still be looking at huge
empty patches with no motivation to fill them in.

> It is also generally much better to have fewer and reliable data (i.e.
> things missing) than wrong data, because wrong data is much harder to
> detect than missing data. Missing data is often apparent, you see that the
> data is not complete, and can draw your clues (and can at least rely on
> what _is_ there), while you will not see if the data is simply wrong, you
> will probably run into problems with wrong data, and if it is too much of
> it, you will likely turn away. On the other hand, for many applications
> incomplete data is still useful.
>

I definitely fall into the "when in doubt, DON'T import" camp nowadays -
and yet, to some others on the project, I look like a cowboy about imports.
But when I look back at what I've imported, there are a few major rules
that I've seemed to follow. Some of these are more-or-less codified in the
import guidelines. (Perhaps others should be. I'm not about to make any
such edits to the guidelines, though. I'm not nearly good enough at OSM
politics.)

1. The data must be better than what's already on the map.  I respect
mappers and don't import anything over their work.  When I imported New
York State Park boundaries, I must have contacted fifty mappers who'd
already mapped what purported to be park boundaries to consult with them.
In all but a couple of cases, the reply was, "I just sketched in my best
guess. If you have authoritative data from someone's GIS, by all means
adjust what I did!" I rejected public data sets of hiking, MTB, horse, ski
and snowmobile trails out of hand, because I could see that their error
rate was too high or that they had been digitized at an inappropriate
scale.  (I do render maps for myself, with those data sets rendered
distinctively as a "to do" list.)

2. One motivator for doing an import is the difficulty of field survey.
I've done cadastral imports of state parks, state forest land (New York's
public-access forests, together, comprise an area greater than the entire
land area of Massachusetts), New York City watershed recreation areas, at
least one NGO's nature reserves, and so on. All these cases were
practically poster children for "won't ever be mapped by another means".
I've recovered survey lines in the past. It's far to laborious to think
that our project will ever do that at scale.

3. Another motivator for doing an import is that it's cleaning up from an
earlier import - particularly if it's scripted so as not to overlay data
that have been hand-edited since the earlier import.  The NY state forest
lands had been imported previously, and it was a "high quality" import for
the time, but by no means was it up to what we'd expect today. (Numerous
topological problems, boundaries of adjoining areas simplified
inconsistently, and so on.)  A recent project that I completed was a
mechanical edit of about 130,000 building addresses - which had been
damaged systemically by an earlier, undocumented, import of MS building
footprints conflated with E911 address points from NYSGIS. I have a current
long-term project of remapping the minor civil division boundaries in my
state - because the ones that came in from TIGER were disastrous.
Essentially, the Census Bureau didn't care about putting roads, farms,
forests and waterways on the wrong side of boundaries, because its concern
is with counting people.  That's also where the weird road alignments in
TIGER came from: digitizing pencil sketches that census takers made,
indicating how to get to houses.  The universal use of TIGER as the
government's street map was not something it had originally been intended
for!

4. My imports are never done blindly. Even with the 130,000 address points,
each of the 600 or so changesets got at least a cursory review.  Because
the errors were systemic, this was easier than it sounds. For instance,
there might be a street where all the street addresses had "West Main
Street" truncated to "Main".  It was pretty easy to eyeball the change and
say, "Yeah, all those houses are on West Main Street," and bless it. The
script also checked in that case to make sure that no mapper had edited the
street name since the import.  Such cases got much closer scrutiny, but
there were only a handful among the 130,000.  I'm sure that the visual and
automated checks were imperfect, but if a few hundred bad addresses slipped
in, that's better than a hundred thousand that were already there.  With
cadastral imports, my scripts present me with not only the proposed change,
but all polygons that overlap it and are in potential conflict (plus all
polygons that appear as if they may be part of a previous version).
Everything's at most a 'manual edit done with mechanical assistance' rather
than what the Wiki pages on 'mechanical edits' appear to assume.

5. Another factor to consider is how much an import impacts everything
else.  Importing roads, or even waterways, at this late date, would be a
non-starter. There's too much there that would have to be conflated, and
the import would have to be topologically integrated.  Importing the
cadastre of watershed recreation areas, by constrast, was relatively
straightforward. The boundary polygons interacted with almost nothing else
on the map; before the import, they weren't there; afterwards, they were.

6. I don't generally commence an import without at least a rough plan for
how to update it.  The forest and recreation-area mapping that I've done
has already been through several cycles of refresh already. Essentially,
the same tools that I used to check for conflict on the initial import can
be used to identify the earlier imported version on an update.  The update
process must tolerate having the imported data hand-edited in between
update cycles.

7. I import only if I can field-check at least part of the imported data. I
may not be familiar with the whole geographic scope of the data set, but I
can at least check whether it appears to be of decent quality in an area
where I have literal boots on the ground.

8. If you import over top of data that I've hand-curated, and it comes as a
surprise to me, then you have not followed the import guidelines (because I
monitor places where you're supposed to announce your import). I'll get
quite annoyed, and you can expect some uncomfortable questions from the DWG.

Imports are very difficult.  Fixing botched imports is even more difficult.
I must discard ten data sets for every one that I'll consider importing.
First-time importers always underestimate just how grueling it is to get
them right. I think every one I've done has been tens, if not hundreds, of
hours of my time - but I estmate that they would have saved many thousands
of hours of other mappers' time, even factoring in the fixes for the
inevitable errors.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports/attachments/20220422/baff3488/attachment-0001.htm>