[OSM-talk] Tagging schema

David Earl david at frankieandshadow.com
Mon Oct 5 10:54:24 BST 2009


On 05/10/2009 00:12, Egil Hjelmeland wrote:
> As a mapper, I want a much more structured, well defined tagging scheme.

Steve started a discussion on the dev list in which I proposed just such 
a scheme/schema. Since there's been several discussions on talk healding 
in this direction, I'll send it here too:

------------
I would like to go further than Steve proposes, but I understand it is a 
hard sell to the anarchist wing of OSM while being welcome to the 
conformist wing. But I really think what I outline below treads a middle 
way. I'm absolutely not proposing to restrict anyone's freedom to add 
new tags or values.

I think it would be really helpful to bring together the tag definitions 
into one place, *in the database and API itself*. I mean a complete 
schema: the tags, their possible values, their descriptions (in multiple 
languages), their equivalences both in other languages and synonyms, 
their related tags (in essence properties of the main descriptive tag, 
hence oneway=... with highway=...), deprecations and so on.

And I think this gets changed as other objects in the database get 
changed: freely but consciously. So if there is a new value for shop, it 
is a conscious act to add that to the list of values for shop, and to 
describe it, not just casually adding it as a tag value.

Let me be quite clear again: this doesn't restrict anyone's freedom to 
add new tags or values. Anyone can edit them just like the map data. It 
does make it a little more work, but the value of doing so both to the 
person making the change and the rest of the community is also increased:
(a) the tag/value is publicised, not buried in the map data, so if it is 
a good one, it is more likely to be adopted. For example, take 
"landuse=orchard" discussed recently. I've tagged at least three areas 
with landuse=orchard in the last 3 years. I just did it. Others may have 
used land=orchard, whatever. However, it would only be obvious I'd done 
this if the renderers knew about it or I'd made a song and dance about 
it. With a central schema, it would automatically be possible for it to 
appear on editor menus for example.
(b) if we choose to check data against this schema, spelling mistakes 
would be eliminated (not in names and other naturally free form data, 
obviously)
(c) editor and consumer programs can all work off the same schema: 
presets and menus of values are table driven and in sync, renderers know 
the possible things they might want to render (not that they have to of 
course) and can see automatically that highway=gate and barrier=gate are 
the same thing (or indeed barriere=tor or barrière=porte).
(d) the meaning of newly introduced or changed tags goes along with 
them, so that the intention is described to others. Editors can offer 
help. Renderers can offer legends.

Here's the kind of thing I had in mind:

* Three new primitives, tagkey for describing the k part of tags, 
tagvalue for the v part of tags and tagdescription separated off to 
allow for multiple descriptions in multiple languages without having to 
download all the data for languages you're not interested in. ("tagkey" 
etc can be anything we want, don't get too hung up on the terminology, I 
just use it for didactic purposes).

In the following, the fields could be key/value pairs, i.e. tags 
themselves, or separate named fields in the database depending on how 
things need to be indexed. But allowing the schema to itself have tags 
means it is extensible. Perhaps it can even be self-describing.

tagkey
   name = [tagkey]
   type = text | scalar | real | integer | boolean | value
          where...
          text: any arbitrary string
          scalar: a number possibly qualified by some units
          real: a floating point number
          integer: an integer
          boolean: vlues such as 'yes', 'true', '1', 'no', 'false', '0'
          value: a value chosen from among a specific set of strings
                 documented by the tagvalue object
   units = [semicolon separated list of possible units]
   defaultunits = [one from the units list]
   appliesto = [semicolon separated list of tagkey or tagkey=tagvalue]
          indicates this tag is usually used as a property qualifying the
          given tags
   relevantto = area | node | way | relation

tagvalue
   name = [tagvalue]
   appliesto = [tagkey]
   relevantto = area | node | way | relation
   photo[:N] = [url] <!-- allows for more than one photo, photo:1 etc -->
   synonym = [tagkey or tagkey=tagvalue]
   seealso = [tagkey or tagkey=tagvalue]

tagdescription
   lang = [languagecode]
   appliesto = [tagkey or tagkey=tagvalue]
   plus a description in that language (not a tag value)

For example
   <tagkey name='barrier' type='value' />
   <tagvalue name='gate' appliesto='barrier' relevantto='node' />
   <tagvalue name='bollard' appliesto='barrier' relevantto='node' />
   <tagvalue name='bollards' appliesto='barrier' relevantto='node'
    synonym='bollard'/>
   <tagdescription lang='en' appliesto='barrier=bollard'>one or a series 
of short posts for excluding or diverting motor vehicles from a road, 
lawn, or the like</tagdecription>

and so on.

David




More information about the talk mailing list