[Talk-us] Tools for importing National Hydrography Dataset?

Thu Apr 9 23:18:50 BST 2009

On Thu, Apr 9, 2009 at 4:06 PM, Ian Dees <ian.dees at gmail.com> wrote:
> On Thu, Apr 9, 2009 at 4:02 PM, Chris Lawrence <lordsutch at gmail.com> wrote:
>>
>> Also included is my script that merges duplicated nodes in the
>> shapefile, which are common along shared boundaries.  It's not quite
>> as good a solution as using line-based borders, but I couldn't wrap my
>> brain around an easy way to program an automated conversion from
>> polygons to lines in any sensible way.  This one requires the Python
>> lxml library (on Debian/Ubuntu, apt-get install python-lxml).  Thus
>> far the filenames are hardcoded, which I may fix later.
>
> Chris, I'm not sure about what you feel your expertise in Java is, but I've
> been looking for some open time to implement the de-deduplification of nodes
> in my Java version of shp-to-osm. Think you could submit a patch and/or talk
> me through the algorithm?

Ian - The basic approach I took was to do it as a separate step after
the main conversion (hence the separate script).  Basically the
procedure is:

- Read in the XML tree from the converted OSM file.
- For each <node>, determine if you have already seen a node at the
same location.  If you haven't, create a mapping (lat, lon) -> id.  If
there is already a node at (lat, lon), create a separate mapping
duplicateid -> originalid and delete the <node>.
- For each <nd>, if the ref points to a duplicateid (e.g. duplicateid
-> originalid exists in the mapping), replace the ref with originalid.
- Dump out the edited XML tree.

You could do this as part of the conversion itself (using the same
approach: one mapping/dictionary for the node locations and one for
the duplicate map); if I get bored I may modify polyshp2osm to
optionally merge nodes at duplicate locations.  The downside is that
you'd have to ensure each node ends up in the same file as the ways it
is part of since negative node ids only make sense within one file,
which means in practice you can't make more than one .osm file per
.shp.*  Not a huge problem when running bulk_upload.pl but JOSM etc.
may hate reading files with 100k+ nodes.

Chris

* Well, you *can* but you have to be smarter about it than simply
chopping the file every n objectids.  You'd really need to do it using
two passes over the data.  I don't really think this is worthwhile in
most cases.