[Tagging] Stop the large feature madness (was: Tag for a plateau or tableland?)

Thu Apr 18 19:26:08 UTC 2019

On Thu, Apr 18, 2019 at 12:53 PM Christoph Hormann <osm at imagico.de> wrote:

> How should they determine that based on local knowledge?  What if there
> is disagreement?  Is
> https://www.openstreetmap.org/way/83015625
> the same river as
> https://www.openstreetmap.org/way/4769426
> or
> https://www.openstreetmap.org/way/174752117
> or
> https://www.openstreetmap.org/way/234008385

I can't comment, not being familiar with the local situation.
Certainly, there is a well-known situation in the US, the 'Ohio', the
'Allegheny' and the 'Monongahela' rivers are considered by the locals
to be three distinct objects, with the first formed by the confluence
of the other two. The Ohio begins in Pittsburgh. Neither of its two
tributaries is the Ohio.  If your example is similar, it's appropriate
to separate them in the database.  I presume that 'Aar', 'Vorderrhein'
and 'Hinterrhein' are conceptually similar - sections, tributaries or
distributaries of the Rhine, with their own distinct identities.

> What if the local mappers do not speak the same language?  Do those who
> speak English automatically get to overrule those who don't?

We've dealt with the issue of names in different languages before, and
it's not necessarily a problem. A few tens of km north of me, nobody
would argue that 'Lac du Saint-Sacrement' and 'Lake George' are
distinct lakes. Whether a mapper speaks English or French, they'd
recognize the lake itself as being the same object, with different
names in two languages. 'Fleuve Saint-Laurent' and the 'Saint Lawrence
River' are the same river, and the speakers of both languages agree on
its identity, even if not its name. It's when the different cultures
or language groups disagree on the object boundaries that things get
difficult, but for the purpose of this particular discussion I'm
trying to focus on "medium sized" objects - too big to map in a day
(or maybe too big for a single mapper to maintain), but for which
there is a broad consensus about object identity.  I'm trying to
address the Jamaica Bays, Cape Cods, and the like, which at least in
my part of the world are much commoner than any objects that have
horrible political implications.

> > > Everything else in physical geography is typically mapped locally
> > > piece by piece like the rivers and creating large features - while
> > > done by some mappers for the purpose of label painting - is
> > > generally disliked by most mappers because it is very hard to work
> > > with these and represents no additional meaningful information.
> >
> > That's where we disagree. The additional information is that the
> > multiple features represent the same physical object.
>
> And how do you verifiably determine if two things are part of the same
> physical object?  For example: [examples snipped]

I'm all for a rule of, 'if in doubt, split,' possibly paired with
creating a new relation to carry the grouping.  You seem to favour a
rule of 'never join,' which is perverse for the common case where
there is broad consensus about object identity.

> > Please avoid the term "label painting." What you call "label
> > painting" is the entirely reasonable desire to have recognized, named
> > objects appear on the map with their names.
>
> I distinguish between names and labels.  Labels are graphical
> representations of names or other strings in map renderings.  The OSM
> database should not contain labels, it should contain names.

On this, we agree. To what object should the name, 'Jamaica Bay' be
assigned? How can such an object be constructed? The locals can
clearly define its extents, except for very small indefinite
boundaries over narrow entrances and exits. What should be done to
give that object, which unquestionably is observable in the field as
an entity distinct from the ocean, existence in OSM?

The last time that we had this discussion, you dismissed wanting to
have Jamaica Bay exist as a named object as 'label painting.' I was
forced then to conclude that your definition of 'label painting' is
considerably broader than simply putting meaningless objects in the
database so that names will appear. Let it be clear: my wish for
Jamaica Bay is to identify the definite portions of its boundary
accurately, to complete its indefinite boundaries for topologic
consistency, and to have it appear in the database as the area feature
that it is, with its name. Note that I did not refer to rendering in
that request. *Part* of my wish to have the name in turn triggers a
wish to render the name. That's an ordinary human desire. Map users,
when there are objects on maps, expect to see them labeled.

Since my use of OSM does not involve intensive use of OSM-Carto, I
really consider its rendering to be secondary.  If I have the named
feature, I'll worry about how to render it in the maps I produce. If I
don't have the feature somewhere, then no conceivable rendering can
show it.  But it goes beyond rendering: I want to be able to do things
like 'gaging stations in Jamaica Bay or within 500 m or it' - which
isn't a terribly difficult database query if I have the feature, and
well-nigh impossible if I don't.

> This:
>
> https://www.openstreetmap.org/relation/9359806
>
> is not a named representation of a verifiable element of the geography,
> it is a labeling geometry.  Creating such is not mapping, it is label
> drawing or label painting.  It is neither meant nor suited to do
> anything other than performing a relatively simple label placement.

I agree that what you show is a horrible example of mapping, and an
excellent example of what we need to discuss in terms of data
representation.

We have that at one extreme, a case where almost all the boundaries
are indefinite.  Nevertheless, the Drake Passage has some sort of
existence. If a map user reads the sentence, 'The _Nancie Belle,_
having survived the perilous journey through the Drake Passage, turned
to the north and made for Buenos Aires,' then that user might well
ask, "Where's the Drake Passage?". Should OpenStreetMap contain the
information needed to answer that question, perhaps through a
Nominatim query?  If so, in what form should that information be
represented?  Ought that information be in such a form that a map of
the Southern Ocean can render it competently, or is that rendering not
a job for OSM?

It is also worth considering that the label placement in our current
renderer is primitive; it's condensed, upright typography, placed on a
single point. I've put several of these projects 'on hold' for health
reasons, but I've intermittently been investigating the possibility of
developing better renderings - ones that could letter 'Skaggerak'
following the curve of the strait, or 'Red Sea' on a nearly vertical
line near the medial axis of the waterbody. While the Skaggerak, if
mapped as an area, would be nearly as bad as the Drake Passage, the
Red Sea has quite a well-defined boundary, with relatively tiny
indefinite borders at Suez and Perim. I therefore foresee the
possibility that we will need some sort of representation of the
approximate shape of indefinite objects, including precise
representations of the definite portions of their borders. An example
that wouldn't require addressing the indefinite object problem, but
might be used as a proof of context, might be the Cannonsville
reservoir https://www.openstreetmap.org/way/134694145.  If you look at
https://caltopo.com/l/JE24, you can see how a skilled human
cartographer placed labels on several map sheets, following the twists
and turns of the water lying in its tortuous valley. I'd like to bring
at least one OSM-compatible rendering closer to that standard - what
we have today is only barely serviceable. For indefinite objects,
simply placing points at the centroids is not going to get me there.
(It's a very, very hard problem, and I don't expect to come even close
to the human standard, but I expect that I can improve considerably on
todays OSM.) I'd propose that we could make progress starting by
investigating the rendering on objects like the Cannonsville
reservoir, while simultaneously trying to develop the data model on
more nearly-definite objects such as Jamaica Bay (smallish) and the
Red Sea (large), then move on to objects with larger indefinite
boundaries.

Is OSM the right repository for this sort of information describing
approximate shape?  If not, how do we integrate it with the correct
repository without requiring mappers to enter and maintain the precise
boundaries redundantly, and with visibility so that mappers in OSM do
not damage these objects in the other repository?

Until and unless we address those questions - which have technology
and policy intermeshed, we're going to see this sort of gaffe appear -
because these geographic features that you dismiss as 'purely social'
still have, and need, names to refer to them.  Names are purely social
to begin with! Nothing has a name in the absence of humans needing to
communicate about it.

We may also have different standards of 'verifiability'.  Few
physiographic objects have their names painted on them in the field.
https://www.openstreetmap.org/node/357560681 has nothing to identify
it (except that the name happens to be written on the cover of the
climbers' log book at the summit), but the mountain is unquestionably
there, and if you go to Ashokan village, point across the reservoir at
it, and ask "what mountain is that?", you'll either get, "sorry, I
don't know its name," or you'll hear, "That's Friday Mountain".  If
you take the strictest possible definition of 'verifiable' being
'someone dropped in the area, possessed of no local knowledge, could
recover the information simply by observation without talking to
anyone nor using any reference source,' it's not 'verifiable', but a
great many features are not verifiable to that strict level, and the
map would be much poorer without them.

> Note by speaking of "label painting" i do not intend to assign one sided
> blame to mappers for doing so.  In most cases this is as much the fault
> of map designers encouraging this as it is of mappers to respond to
> this incentive.

Don't blame the designers - it's a human desire to see and work with
the names of things, however indefinite or controversial.

In any case, let's not throw away the baby with the bathwater.  Can we
solve the problem for less problematic objects like Jamaica Bay and
Cape Cod, see how the map adapts, and move forward to larger ones
(more difficult for servers, data consumers and mappers alike) and
more controversial ones (which strain the existing data model)?
Simply decrying the examples that have been done badly is not going to
make progress. Mappers and consumers alike want to work with named,
partially indefinite areas. Simply throwing out lots of examples of
areas that have been done wrong, without even any suggestion of how
they might be done right, is not going to solve the problem.
Expecting a solution that addresses all the corner cases right out of
the gate will tie us up in 'analysis paralysis.' Can we at least come
up with some concrete suggestion for those indefinite objects that
wouldn't strain the servers, and that exist in milieux whose cultures
are at least sufficiently homogeneous to agree on object identity?
That's a rather large set, and would at least let us prove out a
solution for part of the problem.

> > The "hard to work with" argument is what I said is a technological
> > limitation.
>
> With "hard to work with" i was referring to work for the mapper in
> maintanance, editing and also just dealing with the object being in the
> way when editing other things.  That is not a technological limitation.
>
> When you talked about technological limitations you were referring to
> problems of data users.

I was referring to problems of data users and mappers alike. Better
tools for the mapper can certainly mitigate many of these problems. A
related example: I know that I've done cleanup work at the request of
others who've broken multipolygon relations with shared boundaries.
They struggled with tools such as iD and Potlatch to do things that I
could do in moments with JOSM or Meerkartor. There are some map
features that are simply going to be the domain of 'power users' - and
that may well be entirely all right. I'd not expect a newcomer to deal
with a complex topology like
https://www.openstreetmap.org/relation/6362702 either - using any of
today's tools - but that's where the boundaries are, so we have to
live with them.

> I am glad you understand the problem.  If you now look at examples
> outside the United States (where if i may say so the originally
> different cultures have been largely "homogenized" a long time ago) you
> will realize that the situation is often not that simple in other parts
> of the world.  The fact that people from more than a hundred countries
> from all over the world with very different cultures, world views and
> languages in OSM work together in collecting local knowledge despite in
> many cases not even being able to verbally communicate with each other
> is quite remarkable.  But this amazing cross cultural cooperation
> hinges on on the local verifiability of those things people map.
> Adding large scale concepts to the database that are not verifiable
> based on local knowledge means throwing a wrench into the gears of this
> amazing machine.

Yes, I do recognize the difficulty. I still strongly suspect that
there is significant value to be found even if we confine our
investigation to areas where there is enough homogeneity at least to
conserve object identity. Can we start with the easier examples and
see how far we get before the model breaks?  How would we approach
Jadebusen? Jamaica Bay? The Gulf of Aqaba? The Waddenzee?
Kvinnheradsfjorden?  Those are examples of the sort of things for
which approximate shape would be useful, and that have a mixture of
definite and indefinite boundaries.

Similar land features to address as examples might include Cape Cod,
the Delmarva Peninsula, the Mull of Kintyre and Zeeland.  Again, shape
information for these indefinite features would be tremendously
helpful. (If I've chosen bad examples outside North America, feel free
to suggest better ones!)

I've refrained from mapping anything in this vein because of the
controversies. Can we at least agree that the desire to assign a name
and approximate shape to area features, a small part of whose boundary
is indefinite, is a legitimate desire, and not simply belittle
mappers, data consumers, and users for wanting to work with those
features? Can we recognize that a project to fit that sort of object
into our data model could be contemplated, even if we have to confine
scope initially to features that lie in regions of enough homogeneity
that the locals agree on object identity? I don't wish to minimize the
other issues. I'm stating that they're harder than the issue of
partially-indefinite regions, and further conjecturing that solving
the easier problem might both give us something useful a little
quicker, and might yield insights that could help crack the tougher
nut.

We're never going to resolve the political and philosophical issues,
but can we at least try to address a few concrete examples? Until we
can arrive at sound worked examples, we won't be able to stop the
onslaught of ill-considered mapping.