[OSM-talk] Looking Forward

Sat Dec 24 13:02:39 GMT 2011

Fellow OSMers, 

   I'd like to use the calm end-of-year season to say a few things I
consider important, hoping that I might get a few people to spend a
thought or two on them.

I've been accused in the past of not having a vision for OSM and I
don't claim to. But it would be great if those who have a vision would
also have answers to the issues I want to talk about.

Our admins have recently published a list of "top ten tasks",
technical things they'd like to see implemented sooner rather than
later. Looking at that list makes you think: If *those* are the biggest
problems then the project must be working quite well! But fact
nobody claims that they are the biggest problems: they are just the
lowest hanging fruits. 

The OSMF is very much concerned about making sure that user experience
is improved and that it becomes easier for everybody to contribute to
OSM, shedding our image of being a project for geeks and enthusiasts,
and I've heavily contested that on osmf-talk. The underlying idea seems
to be that of any web startup: Once we achieve growth everything else
will fall into place. And who's to blame them - it has worked for OSM
for quite a while. Indeed among the more technical-minded people in OSM
the making of big plans is usually frowned upon as "astronautism". (To
be fair, a project as big as OSM does indeed attract a fair number of
people who think that even without knowing anything about OSM, they can
surely tell us that we should use a certain trendy technology to
continue to work.)

All this together means that everybody - including myself - is more or
less sticking their heads in the sand when it comes to the real big
issues, issues for which we not only lack a solution, but where it is
even unclear how we could arrive at one and implement it.

(And I'm not even talking about the license change - that is, at least,
an issue where the path ahead is relatively clear and things just need
to get done.)

Here's my shortlist of "big issues" for our future:

1. Strict Tagging Rules

Not a week goes by without some discussion on a forum or mailing list
ending in a lament about the lack of strict tagging rules. Data users
hope to be able to find out that the feature they're looking for will,
if it exists, be tagged exactly so and so. Junior mappers want to know
how exactly to tag certain objects. Experienced mappers despair at
others changing their hard work according to some wiki "vote" that
attracted 15 participants out of tens of thousands.

The current wisdom is "we neither have nor want strict rules" - we have
got where we are now precisely because we did not waste time on trying
to come up with rules. Also, we are an international project with
diverse communities and there is no reason why everyone in the world
should tag the same. But still the issue remains, and a lot of valuable
time is wasted between mappers discussing the merits of one tagging
scheme or the other. Sometimes, edit wars ensue which then bind even
more resources in mediation.

Meanwhile, editor writers and bot programmers gain all the power - it
is, in effect, them who decide what gets tagged how (or auto-corrected
if the user should be so audacious as to use a tag differently).

This is a continuing source of friction. Our data is more valuable and
easier to use if it easy to interpret; mappers could often benefit from
that as well. But tagging freedom, and the lack of a "central tagging
command" structure, have brought us where we are. In contrast to other
systems like Google Map Maker, our mappers are not just worker ants
expected to mind-numbingly fill in the blanks on a form; they can be
creative agents deciding what gets mapped and how. Out of despair some
people even call for a "tagging czar" who would make a decision
whenever the community doesn't arrive at one. This doesn't look like a
good solution to me but it illustrates the problem.

The project changes, and the bold and autonomous mappers of the first
few years (who often had a whole city to themselves) are in decline; we
have more people who actually *want* to be told what to do. But with
many of the sensible people in our project being from that bold
generation and not wanting to create rules for others, we end up with
rules being made by people with strange ideas whose main spare time
activity seems to be rule-making rather than mapping, and voted on by
people who cannot grasp the consequences.

2. Imports and the Community

Governments all over the world are opening up their data. OpenStreetMap
was born from frustration: "You don't give us your data, ok, then we
collect it ourselves." - More and more, governments are giving out
their data, and many people less familiar with OSM's tradition of
surveying think that this must be a boon for the project and wildly
import any government data they can find.

The point has been made that imports damage a community, or even keep a
healthy community from forming in the first place. There already is
more free and open geodata in the world than we could ever deal with;
how can we make sure that OSM is not ruined by importing more than we
can digest?

A stock answer for the "I want to import XY into OSM" issue is "put it
in a separate database and merge when rendering" and this might indeed
be sensible for many areas, but it would require easier ways for
merging data and also ways to see these other data sets while editing;
but most of all we would have to get the message across that OSM is not
intended to be a "melting pot" of the world's geodata. 

3. Level of Detail, Relation Overload

Everyone who downloads OpenStreetMap data gets the same stuff, whether
they want it or not. This is especially true for people wanting to edit
OSM, but also for data users. You always get the full detail, down to
every last post box and garden shed. In the traditional GIS world this
is different; data is usually produced in several scales (see e.g.
naturalearthdata.com) or, where one master map exists, downscaled from
that using sophisticated algorithms or even manual work.

The only application in which we have something comparable is tiles -
of course no attempt is made to draw residential roads on zoom level
10. But other than that, everyone always has to deal with the full
amount of data we have. With ever-increasing depth of detail, the
"50,000 node" download limit for editing means that the area you can
download for editing becomes smaller and smaller, and even if you could
download more it would be hard to handle in the editor. 

OSM prides itself in not having Wikipedia's much-hated and hotly
debated "relevance criteria" - while we do sort of expect that you
don't enter data that you cannot maintain, we largely allow people to
add whatever data they are interested in. This has potential to cause
trouble especially when coupled with imports - recent discussion
revolved around 3D building data, about drawing roads as areas, about
historic information, or about certain streets meanwhile being a member
of 30-odd relations because there are that many bus lines. This is
despite the overwhelming majority of OSM contributors not being
interested in 3D buildings, historic data, or public transport
information. You can choose what to display on a map, but you cannot
choose when editing; the mapper always has to deal with the full depth
of information. There are limited options of "hiding" stuff in editors
but that only goes so far and carries the risk of breaking implied
spatial relationships.

We must find ways for people to deal with one topic to their heart's
content without impacting the work that others do; otherwise any work
we do to make editing simpler will be nullified by the sheer amount and
complexity of data.

4. Data Model Changes & Technology

Our data model is reasonably simple but it does have its quirks and it
begins to show. More than half of our ways are used to model areas
(which is mainly due to imported building data), yet we don't have a
proper area data type. This is being discussed but no solution is
apparent; API or data model changes used to be something that one could
simply decide at a hack weekend but with an ever increasing number of
users and data consumers it becomes more demanding. We manage to paper
over any cracks in our data model by using relations but they are
hopelessly under-specified and often hard to evaluate automatically
(witness e.g. discussions about using cascading relations for
boundaries). Relations often make editing difficult, and they sometimes
have several thousand members and several thousand historic versions
which makes them hard to handle in all respects. But simply ignoring
them or even dropping support for them becomes less and less of an
option.

Also, the way we distribute database updates - the daily, hourly,
minutely diffs - is optimized for keeping a fully replicated OSM
mirror, but less suitable for keeping regional extracts (as you always
download and process a world-wide diff and then discard most of it),
and practically unsuitable for keeping thematic extracts (because a tag
added to a way might suddenly make that way interesting for you but you
lack information about the nodes required).

                                 ----

Some people seem to assume that if only we grow big enough then all
these problems will solve themselves somehow. Maybe we can simply
outsource problem solving to paid coders from some web platform, or
OSMF could hire people and tell them to fix things somehow. But it
seems to me that while these problems may have a technical component,
they are very strongly linked to how we work in OSM - the mappers, the
technologists, the sysadmins, how everything goes together -, and
solving them requires a good way of decision making in the community. I
don't think that e.g. OSMF is the body that can or should be burdened
with that decision making; but I don't have an alternative either. We
like to say "show working code or STFU" but one has to be honest and
admit that once problems reach a certain complexity, that basically
boils down to being in denial about the problem.

I'd love to find that all the problems I listed here turned out to be
non-problems in the end, and were somehow solved automatically along
the way. It wouldn't be the first time for me to think something is a
problem and OSM took it in its stride. Maybe mentioning the issues
already helps to achieve that magic.

Merry Christmas, or whatever you're having -
Frederik