[Tagging] better classification systems

Fri Feb 12 15:56:42 UTC 2021

I think the (or a) fundamental missing link is the definition that the database is for computers; it is the job of the renderer to make the data useful for humans, and it is the job of the "editor" (in the broadest sense) to facilitate the mapping between the two worlds. A second fundamental missing link is the insistence on a single solution for the whole world, while the debate always rages on about what happens in *my* country.

Think of dates; internally they are stored in a variety of ways varying from numbers of microseconds since some time in the past up to standardised string formats including timezones and summer time stuff. No mortal users need to be bothered with those technical details - it's the job of the user interfaces to turn it into something recognisable for the user, and to accept the notations that a user might employ as input. How things are stored internally in OSM should be a secondary discussion. Primarily we should be discussing human-level things - what use-cases do we expect to be covering? What distinctions do we expect to be able to make based on the data? What do we want to be able to get out of OSM? As long as the discussions are solely focussed on what we want to put *in* to OSM, I don't think we will ever be able to rank options against each other, or to say "good enough is good enough". Empirical evidence suggests that this might already be contributing to the stagnation that we see so often in these discussions. Historically OSM has always wanted to make life easy for non-expert mappers and tended to exclude the needs of data consumers. In the early years when quickly achieving a critical mass was so important, I can see why that happened; however OSM is mature now, and it is time to redress the balance and address and prioritise our "customer's needs", which is bound to include data quality and reliability.

OSM is relatively unique in that it tries to find global tagging for region-dependent concepts. Highway types for example use a list of values originating in the UK, and now we force every other country to fit into that system. Concepts like "unclassified" need explaining time and time again. We still don't have a mechanism for having defined regional defaults or implications (like highway=motorway implies foot=no in many places) other than the odd note in a wiki page, but that doesn't lend itself to automated processing.

Clearly there is a big issue with landuse and landcover. There are almost certainly existing ontologies that can be employed for these concepts, but we have allowed ourselves to dig ourselves into a steep-sided pit because whoever started using them in ways like landuse=grass was not prevented from doing so. Such things need an SME (geography?) to start us off in the right way, to define what "landuse" really means, what a good high-level classification might be (10-20 categories max) and how to specialise if required. All the energy at the moment is being spent on climbing out of the hole.

There are continuous discussions about how to put addresses in OSM. It doesn't take long to discover that the UPU, the Universal Postal Union, has already defined a universal data model for addresses, and for each country a map exists to and from the local address model to the canonical model (and also templates for layouts on envelopes etc). This universal data model may seem quite complex to some. I can handle it because I have an IT background, but many people may see it as over-engineered because they are only interested in the simple addressing used in (for example) Belgium. UPU S42 details here: https://www.upu.int/en/Postal-Solutions/Programmes-Services/Addressing-Solutions

In summary, I think we need:
1) acceptance that the data is for computers and not intended for direct human consumption
2) more attention to what we want to get OUT of OSM and relatively less about putting stuff in there
3) SME oversight of tagging ontologies
4) a mechanism for defining a hierarchy of regional characteristics (defaults, implications etc)