[OSM-dev] Restrict key names on order to retain reusability of OSM

Andy Robinson (blackadder) blackadderajr at googlemail.com
Tue Feb 12 13:37:01 GMT 2008

Stefan Keller wrote:
>Sent: 12 February 2008 1:27 PM
>To: Dave Stubbs
>Cc: dev at openstreetmap.org
>Subject: Re: [OSM-dev] Restrict key names on order to retain reusability of
>Thanks for the pointer o XML, Dave.
>UTF-8 is a good choice for content, but this is about *keys* (i.e.
>Keys correspond to XML elements which are defind as names [1](!)
>... which nicely fits the definition I proposed.

The only reason that you have found a large chunk of the keys fitting your
limited character set is that most are based on Map Features, which was
written in English. Anyone could create a set of keys in any other format
they wish that might for instance be using Chinese.



>And to get the discussion little more specific I made some statistics
>with some recent OSM data from an european area of about 75MB:
>From about 100'000 key-value pairs there are about 8000 distinct pairs
>and I found about 8 outliers, listed below. This is at least what came
>out perhaps of OSM REST API 0.5 (or Osmosis)?
>So, the benefit of valid attribute names costs almost nothing to clean,
>almost nothing to prevent (e.g. in editors) but let's us write nice
>applications - and I mean lot more than those you mentioned above...
>[1] http://www.w3.org/TR/2006/REC-xml-20060816/#sec-common-syn
>Outliers found in recent OSM data:
>'Node/Linear/Area  '='Route sans Nom'
>'Tunnel '='yes'
>'wdb:source'='CIA World database II - europe-bdy.txt - segment 100'
>2008/2/12, Dave Stubbs <osm.list at randomjunk.co.uk>:
>	2008/2/12 Stefan Keller <sfkeller at gmail.com>:
>	> GML/XML is *not* the issue, you know that:
>	> It's almost any application outside OSM database.
>	> It's about reusability and consistency!
>	>
>	> I love the approach of key-value pairs (and I like beers too... ;-
>	> I agree with Martijn that before all, spaces must be kept out.
>	> I agree too with Frederik: Colons can be included as namespace
>	> Namespace, tags and keys reminds us, that OSM is a database and
>	> *not* a Wiki on an island (whereas I'm loving Wikis used as they
>	>
>	>
>	> So I'm sorry, guys, but I have to insist:
>	> I propose distinctly to restrict key names (elemement, tag) to the
>	> 'aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ_', now
>	> plus colon as namespace delimiter, allowed once and not at the
>beginning or
>	> the end.
>	Even XML allows significantly more than that -- pretty much anything
>	but whitespace [1], with a ":" as namespace delimiter.
>	So insist all you like, but personally I think making people handle
>	UTF-8 nicely is probably a good thing given the number of values
>	will rely on it heavily anyway. Most reasonable programming
>	environments have decent unicode support these days, and certainly
>	every XML parser that isn't a hack.
>	Dave
>	[1] http://www.w3.org/TR/2006/REC-xml-20060816/#charsets

More information about the dev mailing list