[Imports] A few questions about importing U.S. NHD data for Subbasin 01080105 import (White River in VT)

Thu Feb 16 05:24:57 UTC 2012

Hello All,

I am working on preparing to import subbasin 01080105 of National
Hydrography Dataset data (
http://wiki.openstreetmap.org/wiki/National_Hydrography_Dataset) . This is
the White River subbasin in Vermont. I think I have done a pretty decent
job of preparing the data to import, by simplifying shapes, using ogr2osm
(which handles duplicate nodes) and breaking up large areas into more
manageable sizes. I am fortunate (as far as the import goes at least...)
that there are few water features already mapped in this area, which I am
taking steps to avoid copying/overwriting. Not only that, but the mappings
from the source data to the stream vs river distinction seems fairly
straightforward and obvious.

My primary concern is importing the data in a format which will be easiest
to work with even for new OpenStreetMappers. I don't want to import any
more tags than are necessary, as I have worked with some imported data that
in my opinion have too many tags with incomprehensible data
(tiger:separated, tiger:upload_uuid, tiger:tlid, etc for TIGER and probably
better examples in MassGIS imports around Boston). I tend to believe that
these extra tags simply confuse or slow down the mapping process for users
and may even have a negative affect on the OSM community. I hope the result
follows K.I.S.S. as much as possible.

So these are my ideas about which I am asking for some validation or
suggestions:

- The waterbody data in the source contains a field 'AreaSqKm'. This is the
surface area of waterbodies. This seems pointless to import as it could
presumably be calculated from the shape itself, and will (hopefully!)
become wrong and out of date when waterbody areas are updated by users.

- There are a few tags in the source data which have the exact same
key/value pair for every single feature. FDate (date of last feature
modification) is one of these fields. I believe these tags would be better
added to the wiki page of the dedicated import user and not added as tags
to the features. There are a few other fields in this category:
--- Resolution - value is High for every single feature
--- FTYPE - There are only two values across the source data, which are
also covered by FCode.
--- FCode - this seems like it will be completely useless to mappers once
the data is in OSM.

- My upload test plan is to upload to one of the dev servers and see if it
works. I think I can then wait for and watch the watershed appear on the
dev map. At that point I will attempt to roll back/revert the upload on
that dev server. I will then watch the rivers disappear from the map. Will
that be sufficient?

And my questions are:

- Features in the import data generally have two identifying fields,
ReachCode and ComID. They both seem to be defined as nationally unique
identifiers, with the ComId as the 'National Database Key'. However I
suspect that soon after importing either I or others will want to combine
ways sharing the same ReachCode (which are adjacent) but with different
ComIDs. The main rivers are composed of a lot of short segments with just a
few nodes each (rivers are split into separate polylines every time
waterways come together). Because the ComIDs identify the individual
polylines in the source data, this means that ComIDs will not map cleanly
and would either be appended together or lost (like tiger:tlid from what I
can tell) when ways are joined. What is the utility of these IDs once the
data is in OSM? Would it be better to just not import ComIDs to begin with?

- Any suggestions on which tool to actually upload the data? I will be
uploading roughly 175,000 nodes, 7520 ways and 20 relations. The combined
OSM file is about 19 MB. Is JOSM a viable option in this case? It seems
like the two best options are upload.py and bulk_upload.py (which I would
be comfortable writing a script for). I will most likely be uploading on a
typical low/medium speed residential cable connection.

Anyway, sorry I can be verbose! :) I have my data in near finished form on
Dropbox if anyone would like to review it. I am also planning to move my
process documentation to a wiki page in the near future.

Thank you very much!

Scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports/attachments/20120216/5fb8f12b/attachment-0001.html>