[OSM-talk] Updating of land/water polygons (based on natural=coastline) is too slow and unreliable

Jochen Topf jochen at remote.org
Sun Nov 22 14:16:58 UTC 2020

Hi Joseph,

thanks for bringing up this issue up here. As one of the parties involved, I'll
try to sort out the several issues I see regarding the coastline and then try
to answer your questions. Sorry, for the long post, but I believe there is
quite some context needed to understand the relevant issues here.

1. Where is the coastline?

You can have lots of different definitions of what a "coastline" is and based
on those definitions the coastline is in different places. That's okay, because
OSM has free tagging. Everybody can invent new tags and you can tag the
coastline according to definition X here and according to definition Y there.
When OSM started, people looked for a reasonable easy to apply and useful
definition that would work everywhere. And of course we didn't invent this
definition from scratch, but we oriented ourselves on what other people and
other maps have been doing for a long time.

You have basically three kinds of things involved here: The "sea", the "land",
and "inland water". I think I don't have to explain what the difference between
"land" and the other categories are, you just have to figure out where to draw
the line (OSM uses mean high water line), but thats essentially just details.

The difference between "sea" and "inland water" is a bit more fuzzy, but not
really that complicated. "Sea" is usually salty and usually tidal, "inland
water" is usually fresh water and not tidal. On river mouths you have to decide
where to draw the line and that's something local mappers can decide based on
local discussions and sometimes it comes down to their gut feeling whether the
water area is more river-like or more sea-like. There is a lof of room here to
discuss how to interpret the rules in a specific case, but that doesn't change
the rules themselves.

2. Why is it important to agree on one definition?

The definition is not something which can be decided locally, because
coastlines are a worldwide phenomenon and they show up on basically all maps.
Most maps, the first thing they do is draw the coastline (or the land or sea
area, respectively). On small zoom levels where you don't see much detail, the
one thing that you always see is the coastline. And when large water areas are
not tagged as coastline, you see the difference. I don't particularly care
whether you put the line between the sea and the river a few meters or
kilometers up or down the river. That's certainly something where we are in
that fuzzy area which local mappers can agree on and decide. But the bay at the
mouth of the Rio de la Plata and the Chesapeake Bay are way beyond anything we
can see as inland water by the agreed upon definition. And they are both
clearly visible on a world map.

This is not only important for maps, but even more so for other uses. If you
create statistics based on how many square kilometers of sea there is or
something like it, you don't want this definition to change under you.

After I started a discussion of the problems with the recent coastline changes
in the Chesapeake Bay (see https://www.openstreetmap.org/changeset/94093155)
these points have also been discussed on the tagging mailing lists (starting
here: https://lists.openstreetmap.org/pipermail/tagging/2020-November/056310.html)
and I believe my opinion here agrees with what everybody said their in answer
to Erics posts.

3. Why is it so difficult to change that definition?

OSM has used this definition for a long time. Thousands of maps and other uses
of the data became to depend on this definition. I believe we can not change it
now in any substantial way, because we would break the data for many many uses.
We have free tagging, so we can always amend the data. We do this all the time,
leaving the definition of an existing tag as it is but have extra tags to more
clearly define some details. This doesn't change anything for existing users
and map-creators and other users who want more control over the details can use
the extra tags. So if you want to invent some tags and put them on a
multipolygon for a bay so that the bay can be labelled properly, that's totally
okay in my opinion. You are *adding* to the map that way. But you can't take
away something that has already been agreed on.

In the end there are essential and often used tags, like the coastline or, say,
the "highway" tag, that are by now essentially unchangeable. Unfortunately we
don't have a mechanism in OSM to make larger changes to our data schema in an
ordered and controlled fashion, that would allow these kinds of changes. But
we, as community, do have a responsibility towards the users of OSM to not
"break their world".

4. Coastline Processing

Now we are getting to the stuff Joseph is talking about: There is an extra issue
involved in any discussion of the coastline that makes everything slightly more
complicated and that many newer people might not know about.

The way we process the natural=coastline tag for generating maps, or any other
use, is different from how most other OSM data is processed. Normally, if we
want to map a large area, we'd use multipolygon relations for that. So we could
have multipolygons for "land" and for "sea". But those multipolygons would be
covering the whole world, so huge as to become unmanageable. Instead we use
ways to define the coastline and define the land to be on the left side and the
water to be on the right side of those ways. There is a special piece of
software (that I wrote) that does this processing and a service (at
https://osmdata.openstreetmap.de/) that I run where this data is processed once
a day and everybody can download it. This is where the land/water polygons come
from that most OSM maps use.

Now this processing (turning a coastline way into polygons) only works if there
are no errors in the coastline. If the coastline crosses itself or something
like that, the processing will break. The software can fix some smaller errors
automagically, but it can't do that for everything. (This processing also
produces the data behind https://tools.geofabrik.de/osmi/?view=coastline that
helps the community to fix any errors in the coastline.)

Any change anywhere can affect the coastline processing of the whole world. The
way the processing works all the data for the whole world is collected and
processed as a whole. This is technically much easier than trying to process
pieces of the coastline for different parts of the world. (This certainly could
be improved upon, but it is what we have now and nobody seems to have the time
to improve this process.)

Years ago we had, again and again, problems where sometimes whole continents
would "flood", because the processing had not been able to properly build those
land polygons. We have solved the "flooding" problem by adding a check that
compares the world map generated today to the one from yesterday. If they look
too different (there is some limit I don't know offhand how many pixels
difference is too much), processing is halted and the new coastline is deemed
"suspicious". You can see this in action on this map:
https://osmdata.openstreetmap.de/internal/coastline/ . Here the differences
between the last good map and the current coastline tagging can be seen. And,
as I write this, you can clearly see the difference for the Chesapeake Bay that
triggered this discussion.

When this happens somebody (currently, again, me) has to look at the changes
and decide whether they are okay or not. Sometimes this is because something in
the real world changed (ice sheet melting anybody?) or better satellite images
became available somewhere or so. And sometimes, as in this case, this is
because of some large scale tagging changes.

Okay, so who made me the "guardian" of that decision? That just happened,
because I saw the need for better software for this processing, so I wrote
that, and the need for a reliable service to serve that data, so I set one up.
I don't *want* to be responsible for that, but so far nobody else has been
willing to take on that responsibility.

But as long as I have that, I take this responsibility seriously enough that I
will not release a coastline that is, in my opinion, wrong on a large-scale.
And it doesn't matter if that is due to a mapping error or a deliberate change
in tagging. I want to protect the users of the data I generate from changes
that would break their map and I want to protect OSM from changes that would
make it less useful.

This has the unfortunate side-effect of annoying mappers, because this is the
big problem here: Because of the way the processing is done as I described
above, anything that breaks the map somewhere halts the update and all the
legitimate changes to the coastline will not show up either.

Now I made a big mistake earlier this year when somebody changed the tagging at
the mouth of the Rio de la Plata moving the coastline to the outside of the
bay. I saw that change and did nothing, believing that somebody would fix the
coastline in short order and we'd be back to normal. But this didn't happen.
Nobody changed it back for months, and I didn't have the time and energy to
intervene, fix it myself or start a discussion. Eventually somebody reverted
the coastline and the processing became unstuck, but it took several months.
Ignoring this for so long was an error on my part, I apologize for that. While
I did take my responsibility to not break the map for the users seriously, I
also have a responsibility to let updates of the map get to the users. This
time, as mentioned above, I started a discussion on the changeset (see above)
when I noticed the problem.

So when Joseph writes that people have been retagging the coastline because of
the processing being not fast enough, that gets the causality wrong: Because
people have been retagging the coastline, the processing stops. When there are
no major problems, changes show up within a day which is slower than other
changes users do and that's certainly not ideal.

Can we improve this process? Sure we can run it more than once a day. But that
wouldn't help if it became stuck if there are large changes. It would help if
the process could somehow stop the large changes in one area while letting
changes in other areas proceed. This would give us some time to fix breakages
and to discuss larger changes while not annoying mappers everywhere else. So if
somebody can come up with a way of doing this coastline processing in a better
way and actually implement it, this would be a great thing in my view.

But in the end this is less an issue of the tools, but more of an issue of how
the community organizes itself. We have organized editing guidelines and import
guidelines for a reason and this case isn't that much different. If you want to
do large scale edits that affect a lot of people, you have to discuss them
beforehand in a suitable forum. And if you don't do that, we have an
established procedure how to handle that: The DWG can step in and revert
the change and then we can have that discussion. In the case of the changes
in the Chesapeake Bay the mappers seem to have discussed those changes on a
Slack channel somewhere probably not realizing how large the change was
that they were planning and how many people it would affect, so they chose
a forum that wasn't suitable for such a large scale change. We, as a
community, could try to better define what kind of changes must be
discussed where. But sometimes it is hard to see what consequences some
change will have so these things will happen again and that's why it is good
to have some safeguards built in like the change check in the coastline
processing. I just wish it wasn't me having to deal with that, but some kind
of community process.


On Sat, Nov 21, 2020 at 10:37:22AM -0800, Joseph Eisenberg wrote:
> Date: Sat, 21 Nov 2020 10:37:22 -0800
> From: Joseph Eisenberg <joseph.eisenberg at gmail.com>
> To: osm <talk at openstreetmap.org>
> Subject: [OSM-talk] Updating of land/water polygons (based on
>  natural=coastline) is too slow and unreliable
> I just found out that mappers in the east coast of the USA have been
> converting coastal bays and tidal channels to natural=water areas because
> they don't like how long it takes to get updated land/water polygons based
> on the natural=coastline ways.
> See the comments on this changeset, where Pamlico Sound (a large area of
> water at the edge of the Atlantic Ocean, inside of a line of barrier
> islands - comparable to the Waddenzee in the Netherlands) was changed to a
> natural=water polygon with the natural=coastline removed.
> "That was the reason we started removing the smaller estuaries from the
> coastline, so edits to them would show up on the map in a timely fashion. "
> Unfortunately to get faster re-rendering times, mappers are mis-tagging
> these areas which should be outside of the coastline.
> Is there any way we can improve the process of checking and updating the
> water and land polygons, currently available on
> https://osmdata.openstreetmap.de - so that mistakes do not lead to
> multi-week waits for new polygons? Right now the last update was 11/11/2020
> - ten days ago.
> Is there any way of getting updates more often than once a day in the
> best-case scenario when nothing is broken?
> -- Joseph Eisenberg

> _______________________________________________
> talk mailing list
> talk at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/talk

Jochen Topf  jochen at remote.org  https://www.jochentopf.com/  +49-351-31778688

More information about the talk mailing list