[OSM-dev] Moving to stricter multipolygon parsing

Paul Norman penorman at mac.com
Thu Jun 12 23:25:42 UTC 2014

Osm2pgsql currently tries *very* hard to turn multipolygon relations into
geometries. It currently detects two types of MP relations, new-style and
old-style. A new-style MP has tags on the relation while an old-style MP
only has type=multipolygon on the relation and relies on the ways for 
the tags.

It then tries to deal with odd tagging in various ways. MP handling is one
of the biggest sources of osm2pgsql bug reports[1] and a big time-sink.

One of the bigger issues is moving tags from ways to MPs that are falsely
detected as old-style. This is an attempt to interpret flawed tagging.

I think we need to move to a more strict parsing of MPs, accepting only
new-style MPs and old-style MPs where all outers have identical 
tags and the relation itself has no non-deleted tags.

Osm2pgsql is not just a consumer of data, it is one of the main feedback
tools, so it is strongly integrated into the feedback cycle, so if 
doesn't process a multipolygon, a mapper will likely correct the tagging. By
doing this, it will make it easier for those interpreting raw OSM data.

To support this, I looked for some numbers. Using a shortened deleted tags
list, there are 1 million new-style and 261k old-style MPs. Of the 
256k have a member with role outer. 251k of these have entirely consistent
tags on outers, while 2.3k have two sets of tags among the ways. About 180
have three or more.[3] An old-style MP without entirely consistent tags on
outers is ambiguous and in error.

[2]: A deleted tag is one such as source that osm2pgsql is dropping
[3]: https://gist.github.com/pnorman/ebd41f5a1759916a48b5

More information about the dev mailing list