[OSM-talk] Tagging schema
Egil Hjelmeland
privat at egil-hjelmeland.no
Mon Oct 5 00:12:41 BST 2009
I started mapping my local mountain village in Norway a month ago. I
find the fundamental OSM data model very simple and elegant: The three
basic elements (node, way/area, relation), and properties as key-value
pairs. But I don’t like that free-form tagging has been elevated to a
Religion.
As a mapper, I want a much more structured, well defined tagging scheme.
- When I use a key, I want to know precisely what type of value is expected.
- When I have entered a tag, I want to see in the editor immediately
whether the tagging is valid according to approved rule, proposed rule,
or not according to anything at all.
- When I spend time mapping, I want to know that the data I enter is
useful, and can be used for rendering, for route planning, and for
future interactive maps. Clearly tags must be according to a structure
if that is going to happen.
- I want the editor tools to support and enforce the tagging structure.
- I want that the tagging dialogs for editor tools can be easily
localised to different languages, attracting non-English speaking
mappers. Because I think more mappers will increase the probability for
success for OSM, and hence increase my own motivation.
- I DO NOT WANT TO SEARCH TALK-MAIL ARCHIVES TO LEARN HOW TO TAG!
I think the tagging schema should be formally described. It should be a
pragmatic mix of strict encoded values and free text values. It must of
course be based on the existing (but loosely defined) tagging structure.
The tagging structure must be represented as tables in the OSM database,
along with a XML API.
Of course, such a scheme will not do away with the problem of
classifying real world things. It will always be cases where it is
difficult to classify what you se as a track or as a road, as a service
road or as unclassified road, and so on. And making the Grand Unified
Hierarchy will always fail at some places. But once I have selected an
option, I want to know that it has a well defined meaning in the OSM System.
Some ideas from the top of my head:
Data types:
Every key should be assigned a type (or class). Could be:
- boolean
- enumeration
- numeric
- string
- free text.
Boolean is yes/no. Editor tools should present a Boolean key as a checkbox.
Many of the existing tags fall into enumeration. “highway” is a
enumeration. An enumeration may often include a “unclassified” value
(e.g. building=yes). Editor tools should present an enumeration as a
listbox of some sort. The defined values should be stored in the OSM
database. It should be possible to enter new values, but then the system
should prefix the value with “proposed:” If proposed values are approved
later, then administrator can remove the “proposed:” prefix globally.
Idea editor tool: Approved values marked green, previously proposed
values yellow, other value is red.
Numeric is a decimal number. Editor tools should enforce digits only.
Subclasses may be useful: Numeric:meters, numeric:kilometres,
numeric:currency. Currency obviously need special handling. A toll fee
should always specify what currency is referred.
String is typical like name, address, house number. Editor tools should
present a single line text input. Telephone numbers, URIs, Wikipedia
references may be modelled as a string, or as separate classes, or as
subclasses of a string, to be discussed. Language versions may sometimes
exist.
Free text is end-user-description, mapper-note etc. More or less
complete sentences. Language versions may often exist. Editor tools
should present a multiline, scrollable text input.
Comment BTW: Tag-typing can make tag-use-statistics more to the point:
Statistics on the most used enumeration values is useful. Most used
phone numbers/street names are less useful…
I think a multilevel key scheme should be formalised. ‘:’ seems to be a
de-facto standard. Special purpose mapping should use a sub-key-space.
Hiking, climbing, ornithology, agriculture, archaeology and so on are
examples of special interest groups which should be assigned their own
toplevel tag, to be defined by special interest groups.
Some generic sub-level keys should be predefined for every key. Like
“note”, “description”, “source”, “fixme”. For example: key “ele” may
have “ele.source”=”GPS”. “highway”=”cycleway” may have
“highway:description”=”Sign says so, but lots of sharp bends and rough
edges” (which can be said about 98.5% of Norwegian cycleways). Editor
tools should allow entering informal information in addition to every
formal key-value pairs, but in a structured way.
Database:
In the OSM database/API there should be a table on keys:
- Key name
- Key type/class
- Short description in English (authorative)
- Optionally a png/svg of the rendering
A language translation table on keys:
- Key name
- Language ID (ISO 639)
- Localised description
A table of literal values
- Key name
- Literal value
- Short description in English (authorative)
- Optionally a png/svg of the rendering
A language translation table on literal values:
-Key name
- Literal value
- Language ID (ISO 639)
- Localised name for value
- Localised description in English
Organisation:
OSM is a community of volunteers. So neither bureaucracy or dictatorship
is probably the way to go. I would guess that forking off a “tagging”
mail group with a strict “keep-to-topic” policy would be the way to
proceed. It could deal with tagging schema/policy in general, as well as
core tagging, and assigning top level keys to other sub level tagging
groups.
Well, it is time to get some sleep before work calls tomorrow. I am not
going to implement any of this. I just hope these ideas can spawn some
productive debate.
Best regards
Egil Hjelmeland
More information about the talk
mailing list