[Tagging] Extremely long Amtrak route relations / coastline v. water

Mon Nov 23 03:10:49 UTC 2020

On Sun, Nov 22, 2020 at 8:04 PM Brian M. Sperlongano <zelonewolf at gmail.com>
wrote:

> Therefore, a holistic solution is needed for large objects.  Setting an
> api limit is good because it gives consumers a guarantee about the
> worst-case object they might have to handle.  However, it must also be
> combined with a replacement mechanism for representing large objects.  The
> 2,000 node limit for a way is fine because longer ways can be combined via
> relations.  If the relation member limit were capped, you create a class of
> objects that cannot be represented in the data set.
>

We've already substantially solved that problem for routes. Super-relations
seem to work well, and only rarely do we even need a three-level hierarchy.
As Steve points out, we could go deeper, but there's no need.

>
> What I think is missing is a way to store huge multipolygons in such a way
> that they can be worked with in a piecemeal way.  The answer that
> immediately comes to mind is a scheme where large objects are represented
> as relations of relations, where portions of a huge multipolygon are
> chopped up into fragments and stored in subordinate multipolygon
> relations.  This hierarchy could perhaps nest several levels if needed.
> Now a 40,000 member relation could be composed of 200 relations of 200
> members each, with each subordinate relation member being a valid
> multipolygon with disjoint or adjacent portions of the overall geometry.
>
> Then, an editor could say "here is a large relation, I've drawn bounding
> boxes for the 200 sub-relations, if you select one, I'll load its data and
> you can edit just that sub-relation".
>

> This could *almost* work under the current relation scheme (provided new
> relation types are invented to cover these types of data structures, and
> consumers roger up to supporting such hierarchical relations).  The thing
> that makes this fail for interactive data consumers (such as an editor or a
> display) is that *there's no way to know where relation members are,
> spatially, within the relation*.  The api does not have a way to say
> "what is the bounding box of this object?"  A consumer would need to
> traverse down through the hierarchy to compute the inner bounding boxes,
> which defeats the purpose of subdividing it in the first place.
>

You're right that it's a problem, but you misdiagnose the details. Rather
than identifying bounding boxes, which is easy, the problem comes down to
identifying topology - is a given point in space on the inside or outside
of the multipolygon? The minimal information needed when that question is
asked is one of two things. You need to know either the 'winding number' -
essentially, if you draw a mathematical ray from the point to infinity in a
given direction, how many times do you cross the boundary of the region?
(Odd = inside, even = outside).  The second is to add a requirement to the
data model that the boundaries of regions must follow a particular winding
direction; most GIS systems use the "right hand rule" of specifying that as
you proceed along a boundary way, the interior of a relation should be on
your right.

The second rule is by far the easiest to implement. Unfortunately, it's
also inconsistent with OSM's base data model. The problem is that we do not
necessarily require multipolygons to be sorted in any particular order
(depending on client software to order them if necessary), nor do we
require the boundary ways to proceed in any particular direction with
respect to the multipolygon.  In fact, we cannot require the boundary ways
to proceed in a particular direction, since shared ways between adjacent
multipolygons are a fairly common practice. The practice is somewhat
controversial; nevertheless, it seems like a good idea when the adjoining
regions by their nature are both known to touch and known to be mutually
exclusive. The lines that separate landuse from landuse, landcover from
landcover, administrative region from administrative region, land from
water, or cadastral parcel from cadastral parcel (where cadastre is
accepted, as it is with objects like public recreational land).

Except for monsters such as the World Ocean (the coastline is a perpetual
headache), seas, and objects with extremely complex topology, the problem
is somewhat manageable. A single 'ring' (the cycle of contiguous ways,
inner or outer, that form one region of a multipolygon) or a single
'complex polygon' (an outer way and any inner ways subordinate to it) are
generally quite manageable in terms of data volume.  I can edit shorelines
of the Great Lakes, for instance, with some confidence, by loading into
JOSM all the data near the single stretch of shoreline that I'm working on,
plus the entire outer perimeter of the lake (using the 'download incomplete
members' function); having the shoreline outside the immediate region of
interest doesn't stress the memory even of a somewhat obsolete laptop
computer. Not all editors are as competent with managing large relations -
I've never, for instance, grown comfortable with attempting similar tasks
in any of the browser-based ones I've tried. I used Meerkartor briefly
during a time when the large relations were causing random JOSM crashes
(something to do with interactions with accessibility extensions when
painting the data in the UI), and is was also fairly workable, so this
isn't a JOSM advertisement, necessarily.

The objects that typically give me the worst headaches aren't necessarily
the largest ones - as I said, I deal with long routes such as the
Appalachian Trail, or large areas such as the Great Lakes - but rather the
diffuse ones. (Many National Forests are both!)  Editing messy multipolygon
like https://www.openstreetmap.org/relation/6360587 - particularly one
where the ways are shared with other objects (as where a recreation area
shares boundaries with an adjacent wilderness area, or is defined by a
shoreline or a stream centerline) - is, as an elderly relative of mine used
to put it, "a pain where a pill don't fix it!"

I do not agree at all with the contention that nothing is lost by breaking
the association among the individual fragments of such a diffuse area.
They share a name, an administrative authority, a management plan, a web
site, a set of regulations, and so on.  They are the parts of a whole that
happens to be fragmented into a lot of spatially disjoint, although loosely
grouped, pieces.  I do understand that "relations are not categories" but
I'm not trying to create a relation for "all Wild Forest areas" or "all New
York State lands", but rather for the particular facility known as the
"Wilcox Lake Wild Forest." The neighbours and visitors of that forest do
conceptualize it as a single thing, so we do lose a lot if you tell me
"just don't map that way."

Extracting a geographic region from a large multipolygon for rendering is
somewhat a solved problem, although implementations in particular tools
vary. There are a number of named algorithms related to the issue.
Wikipedia offers some good jumping-off points:

Sutherland-Hodgman:
https://en.wikipedia.org/wiki/Sutherland%E2%80%93Hodgman_algorithm
Weiler-Atherton:
https://en.wikipedia.org/wiki/Weiler%E2%80%93Atherton_clipping_algorithm
Greiner-Hormann:
https://en.wikipedia.org/wiki/Greiner%E2%80%93Hormann_clipping_algorithm
Vatti: https://en.wikipedia.org/wiki/Vatti_clipping_algorithm (see also
https://en.wikipedia.org/wiki/Bentley%E2%80%93Ottmann_algorithm)

They work quite well in practice for rendering and geocoding in limited
geographic areas. The spatial indexing of the relational databases we use
also performs well in practice except for the case where the region is both
large and topologically complex.

The key issue for editing is that edits must ensure topologic consistency.
Most proposals that I've seen for representing large multipolygons by
subdivision fail at this - they require the entire multipolygon to identify
that the portion being edited does not introduce crossing ways or
disconnect the boundary. This is the perennial problem with the coastline -
it's never complete and consistent, so the generalization of the coastline
never seems to happen.

Apologies to the 'tagging' mailing list in that I'm wandering off into data
storage, data retrieval, editing and rendering technology, none of which
really bears on how the objects are mapped and tagged.  There's almost
certainly a better forum in which to hash out design details of a data
model that addresses Brian's issue satisfactorily, and I'll happily follow
to wherever the discussion of the technological problems moves.

-- 
73 de ke9tv/2, Kevin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tagging/attachments/20201122/196ff832/attachment.htm>