[Imports] A few questions about importing U.S. NHD data for Subbasin 01080105 import (White River in VT)

Alan Mintz Alan_Mintz+OSM at Earthlink.Net
Thu Feb 23 09:15:05 UTC 2012


At 2012-02-22 20:27, probiscus wrote:
>I am working on preparing to import subbasin 01080105 of National
>Hydrography Dataset data (
>http://wiki.openstreetmap.org/wiki/National_Hydrography_Dataset) ...
>My primary concern is importing the data in a format which will be easiest
>to work with even for new OpenStreetMappers. I don't want to import any
>more tags than are necessary, as I have worked with some imported data that
>in my opinion have too many tags with incomprehensible data
>(tiger:separated, tiger:upload_uuid, tiger:tlid, etc for TIGER and probably
>better examples in MassGIS imports around Boston). I tend to believe that
>these extra tags simply confuse or slow down the mapping process for users
>and may even have a negative affect on the OSM community.

I disagree. You don't need to understand what every tag means, particularly 
these data carried from import sources. Sometimes, it's not immediately 
obvious what to do with some of the information during import, so it can be 
useful to carry it into the OSM database. Certainly a unique foreign key 
back to the original database (e.g. tiger:tlid) makes future updates more 
practical, as well as the ability to fix problems in the import that aren't 
noticed until afterwards.


>- The waterbody data in the source contains a field 'AreaSqKm'. This is the
>surface area of waterbodies. This seems pointless to import as it could
>presumably be calculated from the shape itself, and will (hopefully!)
>become wrong and out of date when waterbody areas are updated by users.

Good. I've seen other imports needlessly bring in dimension and location 
information.


>- There are a few tags in the source data which have the exact same
>key/value pair for every single feature. FDate (date of last feature
>modification) is one of these fields. I believe these tags would be better
>added to the wiki page of the dedicated import user and not added as tags
>to the features. There are a few other fields in this category:
>--- Resolution - value is High for every single feature
>--- FTYPE - There are only two values across the source data, which are
>also covered by FCode.

Also good.


>--- FCode - this seems like it will be completely useless to mappers once
>the data is in OSM.

If this is the feature type (stream, river, etc.), unless there is a 1:1 
correspondence between FCode and the OSM tags, I would bring it into OSM 
(as NHD:FCode). This handles the likely event that someone finds an issue 
with the tags that you chose to use.


>- Features in the import data generally have two identifying fields,
>ReachCode and ComID. They both seem to be defined as nationally unique
>identifiers, with the ComId as the 'National Database Key'. However I
>suspect that soon after importing either I or others will want to combine
>ways sharing the same ReachCode (which are adjacent) but with different
>ComIDs. The main rivers are composed of a lot of short segments with just a
>few nodes each (rivers are split into separate polylines every time
>waterways come together). Because the ComIDs identify the individual
>polylines in the source data, this means that ComIDs will not map cleanly
>and would either be appended together or lost (like tiger:tlid from what I
>can tell) when ways are joined.

This is one of the things I dislike about the NHD imports that I've seen - 
the vast number of short segments that creates such feature density as to 
make any manipulation (download, editing, etc.) of the area more difficult. 
These short segments should be joined before import. You can then generate 
and keep a table of which segments were combined. I would carry the 
ReachCode and ComID into OSM, using some sort of pseudo-ComID for these 
pre-combined ways.


>What is the utility of these IDs once the
>data is in OSM? Would it be better to just not import ComIDs to begin with?

It provides the ability to trace back to the source data in case you (or 
others) find something that could have been done better, and also provides 
the ability to update OSM from updated NHD data in the future.


--
Alan Mintz <Alan_Mintz+OSM at Earthlink.net>




More information about the Imports mailing list