[Imports] A few questions about importing U.S. NHD data for Subbasin 01080105 import (White River in VT)

Thu Feb 23 07:52:20 UTC 2012

Posting the .osm and the file used to map from the NHD to .osm would be
good.

An upload of this size is a bit large for JOSM, but if you split it into
smaller areas it might work. JOSM is good for imports where you spend a lot
of time merging with existing data, or for import cleanups where if it is
interrupted it can cleanly continue.

From: probiscus [mailto:probiscus12+osm_imports at gmail.com] 
Sent: Wednesday, February 22, 2012 8:27 PM
To: imports at openstreetmap.org
Subject: [Imports] A few questions about importing U.S. NHD data for
Subbasin 01080105 import (White River in VT)

Hello All,

I am working on preparing to import subbasin 01080105 of National
Hydrography Dataset data
(http://wiki.openstreetmap.org/wiki/National_Hydrography_Dataset) . This is
the White River subbasin in Vermont. I think I have done a pretty decent job
of preparing the data to import, by simplifying shapes, using ogr2osm (which
handles duplicate nodes) and breaking up large areas into more manageable
sizes. I am fortunate (as far as the import goes at least...) that there are
few water features already mapped in this area, which I am taking steps to
avoid copying/overwriting. Not only that, but the mappings from the source
data to the stream vs river distinction seems fairly straightforward and
obvious.

My primary concern is importing the data in a format which will be easiest
to work with even for new OpenStreetMappers. I don't want to import any more
tags than are necessary, as I have worked with some imported data that in my
opinion have too many tags with incomprehensible data (tiger:separated,
tiger:upload_uuid, tiger:tlid, etc for TIGER and probably better examples in
MassGIS imports around Boston). I tend to believe that these extra tags
simply confuse or slow down the mapping process for users and may even have
a negative affect on the OSM community. I hope the result follows K.I.S.S.
as much as possible.

So these are my ideas about which I am asking for some validation or
suggestions:

- The waterbody data in the source contains a field 'AreaSqKm'. This is the
surface area of waterbodies. This seems pointless to import as it could
presumably be calculated from the shape itself, and will (hopefully!) become
wrong and out of date when waterbody areas are updated by users.

- There are a few tags in the source data which have the exact same
key/value pair for every single feature. FDate (date of last feature
modification) is one of these fields. I believe these tags would be better
added to the wiki page of the dedicated import user and not added as tags to
the features. There are a few other fields in this category:
--- Resolution - value is High for every single feature
--- FTYPE - There are only two values across the source data, which are also
covered by FCode. 
--- FCode - this seems like it will be completely useless to mappers once
the data is in OSM.

- My upload test plan is to upload to one of the dev servers and see if it
works. I think I can then wait for and watch the watershed appear on the dev
map. At that point I will attempt to roll back/revert the upload on that dev
server. I will then watch the rivers disappear from the map. Will that be
sufficient?

And my questions are:

- Features in the import data generally have two identifying fields,
ReachCode and ComID. They both seem to be defined as nationally unique
identifiers, with the ComId as the 'National Database Key'. However I
suspect that soon after importing either I or others will want to combine
ways sharing the same ReachCode (which are adjacent) but with different
ComIDs. The main rivers are composed of a lot of short segments with just a
few nodes each (rivers are split into separate polylines every time
waterways come together). Because the ComIDs identify the individual
polylines in the source data, this means that ComIDs will not map cleanly
and would either be appended together or lost (like tiger:tlid from what I
can tell) when ways are joined. What is the utility of these IDs once the
data is in OSM? Would it be better to just not import ComIDs to begin with?

- Any suggestions on which tool to actually upload the data? I will be
uploading roughly 175,000 nodes, 7520 ways and 20 relations. The combined
OSM file is about 19 MB. Is JOSM a viable option in this case? It seems like
the two best options are upload.py and bulk_upload.py (which I would be
comfortable writing a script for). I will most likely be uploading on a
typical low/medium speed residential cable connection. 

Anyway, sorry I can be verbose! :) I have my data in near finished form on
Dropbox if anyone would like to review it. I am also planning to move my
process documentation to a wiki page in the near future.

Thank you very much!

Scott

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports/attachments/20120222/fdf70026/attachment.html>