[OSM-talk] Tagging schema

Mon Oct 5 00:12:41 BST 2009

I started mapping my local mountain village in Norway a month ago. I 
find the fundamental OSM data model very simple and elegant: The three 
basic elements (node, way/area, relation), and properties as key-value 
pairs. But I don’t like that free-form tagging has been elevated to a 
Religion.

As a mapper, I want a much more structured, well defined tagging scheme.

- When I use a key, I want to know precisely what type of value is expected.
- When I have entered a tag, I want to see in the editor immediately 
whether the tagging is valid according to approved rule, proposed rule, 
or not according to anything at all.
- When I spend time mapping, I want to know that the data I enter is 
useful, and can be used for rendering, for route planning, and for 
future interactive maps. Clearly tags must be according to a structure 
if that is going to happen.
- I want the editor tools to support and enforce the tagging structure.
- I want that the tagging dialogs for editor tools can be easily 
localised to different languages, attracting non-English speaking 
mappers. Because I think more mappers will increase the probability for 
success for OSM, and hence increase my own motivation.
- I DO NOT WANT TO SEARCH TALK-MAIL ARCHIVES TO LEARN HOW TO TAG!

I think the tagging schema should be formally described. It should be a 
pragmatic mix of strict encoded values and free text values. It must of 
course be based on the existing (but loosely defined) tagging structure. 
The tagging structure must be represented as tables in the OSM database, 
along with a XML API.

Of course, such a scheme will not do away with the problem of 
classifying real world things. It will always be cases where it is 
difficult to classify what you se as a track or as a road, as a service 
road or as unclassified road, and so on. And making the Grand Unified 
Hierarchy will always fail at some places. But once I have selected an 
option, I want to know that it has a well defined meaning in the OSM System.

Some ideas from the top of my head:

Data types:

Every key should be assigned a type (or class). Could be:

- boolean
- enumeration
- numeric
- string
- free text.

Boolean is yes/no. Editor tools should present a Boolean key as a checkbox.

Many of the existing tags fall into enumeration. “highway” is a 
enumeration. An enumeration may often include a “unclassified” value 
(e.g. building=yes). Editor tools should present an enumeration as a 
listbox of some sort. The defined values should be stored in the OSM 
database. It should be possible to enter new values, but then the system 
should prefix the value with “proposed:” If proposed values are approved 
later, then administrator can remove the “proposed:” prefix globally. 
Idea editor tool: Approved values marked green, previously proposed 
values yellow, other value is red.

Numeric is a decimal number. Editor tools should enforce digits only. 
Subclasses may be useful: Numeric:meters, numeric:kilometres, 
numeric:currency. Currency obviously need special handling. A toll fee 
should always specify what currency is referred.

String is typical like name, address, house number. Editor tools should 
present a single line text input. Telephone numbers, URIs, Wikipedia 
references may be modelled as a string, or as separate classes, or as 
subclasses of a string, to be discussed. Language versions may sometimes 
exist.

Free text is end-user-description, mapper-note etc. More or less 
complete sentences. Language versions may often exist. Editor tools 
should present a multiline, scrollable text input.

Comment BTW: Tag-typing can make tag-use-statistics more to the point: 
Statistics on the most used enumeration values is useful. Most used 
phone numbers/street names are less useful…

I think a multilevel key scheme should be formalised. ‘:’ seems to be a 
de-facto standard. Special purpose mapping should use a sub-key-space. 
Hiking, climbing, ornithology, agriculture, archaeology and so on are 
examples of special interest groups which should be assigned their own 
toplevel tag, to be defined by special interest groups.

Some generic sub-level keys should be predefined for every key. Like 
“note”, “description”, “source”, “fixme”. For example: key “ele” may 
have “ele.source”=”GPS”. “highway”=”cycleway” may have 
“highway:description”=”Sign says so, but lots of sharp bends and rough 
edges” (which can be said about 98.5% of Norwegian cycleways). Editor 
tools should allow entering informal information in addition to every 
formal key-value pairs, but in a structured way.

Database:

In the OSM database/API there should be a table on keys:

- Key name
- Key type/class
- Short description in English (authorative)
- Optionally a png/svg of the rendering

A language translation table on keys:

- Key name
- Language ID (ISO 639)
- Localised description

A table of literal values

- Key name
- Literal value
- Short description in English (authorative)
- Optionally a png/svg of the rendering

A language translation table on literal values:

-Key name
- Literal value
- Language ID (ISO 639)
- Localised name for value
- Localised description in English

Organisation:

OSM is a community of volunteers. So neither bureaucracy or dictatorship 
is probably the way to go. I would guess that forking off a “tagging” 
mail group with a strict “keep-to-topic” policy would be the way to 
proceed. It could deal with tagging schema/policy in general, as well as 
core tagging, and assigning top level keys to other sub level tagging 
groups.

Well, it is time to get some sleep before work calls tomorrow. I am not 
going to implement any of this. I just hope these ideas can spawn some 
productive debate.

Best regards

Egil Hjelmeland