[OSM-talk] Fixing broken multipolygons, some notes

Sandor Seres sandors39 at gmail.com
Sat Mar 18 20:40:26 UTC 2017

I am new to this list and therefore apologize for eventual
misinterpretations and wrong stile. The motivation for the mail is a
worrying mail on the local list about the purer osm2pgsql based maps and
about the "broken polygons" fixing strategies. The mentioned white spots in
the Scandinavian forests are just an illustration. By simply dropping broken
polygons, empty spots will be typical for any area types and for any corners
of the Planet. 

As I understand, osm2pgsql is an application doing data preparation from the
OSM source data up to a DB used by many mapmakers for rendering. We can see
that almost all OSM based public mapping system use this database and
consequently repeat the same anomalies. Therefore, maybe, making the
osm2pgsql more robust could be a better strategy. There is still a large
potential for such strengthening. Just waiting for "do-ocracy" reparations
is really a long-term strategy. Anyway, users starting from the source OSM
data will not be affected by any of these strategies.

The "Fixing broken polygons", especially programmatic/mass fixing, could be
more dangerous to all users. Just look at the many possible self-crossing
fixing options. Loosely defined notions open for different interpretations
and different sets of error criteria. Consequently, for the same object type
we may have (and we do) different error classes and reparation tools.
Besides the typical polygon interpretations as area (ESRI polygon
redefinition) or as a closed polygonal line, we simply can't find in the
documentation what "outer", "inner", "hole" . notions actually mean. The
interpretation (individual perception) of these notions is left to us and
there we have a source of misunderstandings. For instance, if we assume that
"outer" border polygons define the interior candidate points (points inside
and on the border) and "inner" border polygons define (in the same way)
exterior points of area than self-crossings, touching polygons, polygon
overlaps, crossings. are not errors at all. 

However, my point here is still something else. The "broken multipolygon"
(whatever that means) issue is just "the tip of the iceberg". There is still
remaining huge number of anomalies caused by area object relations from
different area classes. I intentionally use the anomaly notion, as a
moderate form for error, because many people/mapmakers may liv with them.
But a modern GIS system and a vector layers based digital cartography cannot
tolerate them. Let me present some arguments and illustrations. Let us look
at a map extract from the mentioned Scandinavian forests here
http://osm.org/go/0Tt1PZIt- . The example could be taken from any corner of
the Planet and, as mentioned, there is huge number of similar cases. At the
first glance, everything looks correct and nice (and it is). However, we see
immediately that something is still wrong. The forest type symbols are
placed directly over the water. In another style, typical land related names
are on the water like here http://osm.org/go/0Tt1PZIt-?layers=T . Looking at
the source data we can see that the lake in the middle is placed over an
empty space (intentionally, not a hole) where the border of the lake runs
slightly in and outside the forests. At the same time, we can see many
forest areas inside the mentioned empty space overwritten with the lake that
has no holes. Consequently, there are many missing islands in the lake and
many missing forest areas in the extract. Note that only on that little
extract there are more than 40 of the described anomalies. What more, there
are many lakes with borders running in/out of forest areas (corridor border
overlaps), having considerable parts over a forest and holes in forests,
partly overlapping several disjunctive forest areas and so on, and the
contrary. Extending the case to the Planet and other area types combinations
we may feel the extent of the issue. There were attempts to compensate these
problems in renderings like rendering the holes, rendering smaller over
larger objects and so on. These actions generally do not work. Simply, they
do good some places and damaging at other places. So, the question is
whether and what can we do with the problem. Just waiting for do-ocracy
based reparations is, obviously, irrational. Fortunately, the source data
has a large potential to remove most of the mentioned anomalies. Let me
present some hints in bullets for the forests, lakes and river combinations.

Assume {F0} is a set of all forest outer border polygons (closed polygonal
lines) and {F1,L0,R0} is a set of all inner forest, outer lake and outer
river border polygons (the orientations and the relations are irrelevant).
Then, you can prove the existence of minimal disjunctive simple area
coverage of the forests. In other words, you can find a set of isolated
simple areas (one outer and zero or any number of inner polygons) where any
area point is on/inside of at least one element in {F0} and never on/ inside
of any element in {F1,L0,R0}. This coverage is the topological area
difference, or subtraction, {D}=U{F0}-U{F1,L0,R0}, where U stands for union.
To find this coverage is really a nice challenge for researchers in
topology, algorithms and, of course, in programing. Some data preparation
tools already have procedures for making this coverage for some  major area
type combinations like the planet_sea/global_ocean, forests, lakes, rivers
and some more. An extract from such coverage for forests, lakes and rivers
combination is presented in this image
g . Note that whatever Z/rendering order one takes the image is always the
same. The only difference may appear in the borderline colours if hard edge
rendering is used but even this difference disappear with the "smooth edge"
anti-aliasing technology.

Regards, Sandor.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20170318/71293506/attachment.html>

More information about the talk mailing list