[OSM-dev] HOW TO MAKE THE OSM RIVER SYSTEMS AND WHY?

Mon Sep 30 08:48:05 UTC 2013

The subject of this compact overview is complex and complicated stuff. With
all details it would be enough for a book. Besides, it comes long before
any coding. Also, the subject is probably more in solution architecture
than in the application area. Therefor it might be of interest only for a
limited forum of developers, especially interested in OSM data based vector
mapping model options. At the same time the state of the OSM mapping
systems show that the subject is still highly current. With all risks of
misunderstandings we will present the basic phases of the model and the
major steps/functions in these phases.

-In the first phase we extract 4 data layers from the OSM regular dump.
These are in a pure geometry format and the layers/data-sets are:
riverbanks, rivers, lakes and river-lines. This is the only place where we
use tags and relations. After this phase we prefer the topology and the
geometry.

-The next phase is related to the redundancy elimination within any of the
data layers. The steps/functions we perform are:

Elimination of the replicated consecutive nodes and consecutive edges;

Elimination of exactly replicated poly-objects (polygons or poly-lines); and

Elimination of the almost replicated poly-objects (corridor criterion
based).

By this phase we remove roughly 9.5% of nodes (or several millions).

-Next, we transform any of the four geometries into simple geometries
(simple areas and simple polylines). Roughly we perform the following steps:

For the area layers we correct/repair any open polygons (out of some then
thousand we repair over 90%). If possible, border segments are connected,
gaps are closed and polygonal line end-points are snapped or connected.

Next, we transform the polygon sets into simple area structures (one
outer/container and n>=0 number of inner/hole polygons) separately for any
of the area data layers. For the river-lines/poly-lines the simple
polylines contain only nodes of order <=2.

-In the next phase we create the large areas that cover
neighbouring/connected areas from the three area layers. The steps here are:

>From the lake areas we extract only those having considerable overlaps with
at least one area from the riverbanks or rivers data layers. The rest of
the lakes are irrelevant here. Note that there is a large number of
overlaps among the elements of the three area layers (any combinations
between areas from riverbanks, rivers and lakes). Unfortunately, as a rule,
the overlapping areas are with different topologies (structures) and this
is another cause of systematic OSM errors.

In the next step we merge the three simple area sets into one and from here
the area source attribute is ignored.

Next, we smash a copy of the simple area set into a (huge) number of none
crossing and none overlapping vectors. Out of these, we keep only the real,
the pure border vectors (at least one simple area interior on one side and
no area interior on the other side). So, we connect the border vectors into
border polygons.

Finally, the border polygon set is transformed, again, into a set of simple
areas (with a correct orientation of the container and hole border
polygons). These simple areas are simple only by their structure. Actually,
these are huge areas covering large number of the input area fragments that
correspond to the natural river systems (like the Danube, the Amazonas, the
Mississippi river system, and so on).

In this final set of river systems we managed to remove many thousands of
systematic logical errors and a huge volume of redundancy.

-This final phase is optional. We extract from all river-line poly-lines
only those having at least one node common with a river system/area. From
any of these we remove the sections being fully inside an area. In this way
we end up with a set of river-line segments that have one or two endpoints
common with one or two river areas (indicating side-rivers, connectivity or
errors). Note that there is large number of cases where a river-line
section runs along the same river area (causing virtual islands or
thickness). Again, this is a cause of another systematic logical error.

Finally, let us mention some arguments – why to make the OSM river systems?

-The simple area structures reflect the naturel river systems’ structure in
a best way.

-It contains very limited amount of redundancy and replications.

-The simple area structures of the river systems is a kind of preconditions
to the latter efficient processing phases like zoom levels’ generation
(generalization by vector smoothing), tiling and so on.

-It contains very limited number of systematic errors present in the OSM
source data (and present in most of the OSM data based mapping systems).

-It is an excellent help to detect gaps still existing in the river data
layers and to make different estimates.

Just to mention some.

Sandor

If interested, more details, illustrations and examples can be found in the
white paper here:

https://docs.google.com/file/d/0B6qGm3k2qWHqSEVObDhscFFSS00/edit?usp=sharing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20130930/e121130b/attachment.html>