[Imports] Import of Allegheny County, Pennsylvania data

Greg Troxel gdt at ir.bbn.com
Tue Feb 23 02:21:41 UTC 2016


I'm speaking from the perspective of greybeard-advising and helping with
an import of building footprints from MassGIS, in which Jason Remillard
did almost all of the prep work, and multiple people (basically all the
active mappers in Mass who answered our messages) helped with checking
the proposed upload files.  Please note that I am generally on the
pro-import side of things, so even though I have a lot of pointed
questions I'm not trying to discourage you -- just to point out where
there is work to do in defining the proposal to the point where this
group could address it.  I am also not speaking from any particular
position of authority, merely as an imports list member and one of the
active mappers in Mass (who knows most of the other active mappers by
email and some in person, at least the non-reclusive ones).

It's great that your county is willing to make data available under a
compatible license.  Even if you don't continue down the import path,
openly publishing data under a clear license is a huge step, because it
enables anyone in OSM to begin thinking about imports, or examining
individual details.

Imports should be done by experienced mappers.  Are the people who will
be doing this already mappers, who have learned JOSM and done
significant mapping (at least 100 changesets over several months)?  (The
mass building import group had several thousand changesets and over 10
years of editing spread over a half dozen people; that felt about right
to be doing an import.)  This is particularly important since in OSM
there is no simple replacing of layers with a better dataset for that
layer, and this becomes far clearer after working with editors for a
while.

Imports are required to have community support.  Do you know who the
active mappers are in your county or state?  Have you met/emailed with
them?  What is their opinion of this?  Are they willing to help with QA?

You are proposing a very large scale import (really 5 of them), some of
which (roads, buildings) have very significant issues with conflation of
existing data.  Park boundaries and trails are probably less difficult,
because I'm guessing 100% human review is possible, and hydrology may or
may not be tricky depending on how much there is.  Still, there is a
strong norm about not overwriting hand mapping with imported data, and
the import workflow will have to be organized around respecting that
norm.

You will likely have to set up 5 wiki pages to describe the 5 imports.
The first issue is licensing.  Just 'publically available' is not
enough; you will have to have the agency state that the date is public
domain or have an explicit license that meets the contributor terms.

In addition to the original data being available, you should prepare to
publish the source code of how you translate that data to osm files, and
how you prepare change sets that would be uploaded.  That might be a
program that loads OSM data into postgis, loads the county data, and
produces change files.  Part of this code is explaining how all the
(presumably) attributes in shapefiles are turned into osm tags (or
omitted, like county unique identifiers).  You should also think about
how/if future updated county datasets would be handled -- is there some
notion of incremental updates?

(In Mass, we had these change files (add only, actually) broken up by
town, as this is how the source data was.  The basic plan was to take
each building polygon from MassGIS and to put it in the output file only
if it did not overlap any building in OSM.  That resulted in some
non-imported buildings, but did not overwrite any hand-drawn work.  In
the end, we found that all was well with this import, except that a few
tents were added as buildings which we clean up as we notice, probably
under a dozen out of 2E6 buildings imported.)

For buildings, you could use a similar approach to what we did in Mass.
You could also output some other kind of difference file for buildings
that do overlap, to understand what's different, and perhaps set that up
for some sort of maproulette challenge, but that's harder and I'd put it
off to a second round.  Here are some pointers/examples and code, which
could be reused significantly if not entirely:
  https://wiki.openstreetmap.org/wiki/MassGIS_Buildings_Import
  https://github.com/jremillard/osm_building_import
Note that the import guidelines and demand for rigor have strengthened
since then, so while what we did is still ok, you may have to make a
clearer explanation of choices and consequences.

For roads, surely you will find that tiger roads are already there.  For
roads not touched since the tiger import, a way to adjust geometry and
attributes from the better data sounds conceptually reasonable.
However, this is easier said than done (especially not breaking
connectivity at roads that cannot be updated), and you will definitely
have to publish the code so that others can run it and look at the
proposed changes.  This is almost certainly the hardest thing of the
list you propose.

Buildings may be the best thing to start with, as adding  high-quality
non-overlapping buildings raises the fewest issues.

Trails data might well be integrated incrementally by existing mappers,
once published, also, depending on how much there is.  Probably it's
big; just my single town has over 100 trails, so the county is probably
at 5K.

I can't emphasize enough that publishing the data with a clear license
is the first step, and that enables others to help.  I suspect that a
number of imports@ people don't really want to put time into helping
until that's done and the license is known to be ok and the data can be
looked at by some others.

Greg (osm user gdt)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 180 bytes
Desc: not available
URL: <http://lists.openstreetmap.org/pipermail/imports/attachments/20160222/7dd2eab9/attachment.sig>


More information about the Imports mailing list