[OSM-dev] Restrict key names on order to retain reusability of OSM
Andy Robinson (blackadder)
blackadderajr at googlemail.com
Tue Feb 12 13:37:01 GMT 2008
Stefan Keller wrote:
>Sent: 12 February 2008 1:27 PM
>To: Dave Stubbs
>Cc: dev at openstreetmap.org
>Subject: Re: [OSM-dev] Restrict key names on order to retain reusability of
>Thanks for the pointer o XML, Dave.
>UTF-8 is a good choice for content, but this is about *keys* (i.e.
>Keys correspond to XML elements which are defind as names (!)
>... which nicely fits the definition I proposed.
The only reason that you have found a large chunk of the keys fitting your
limited character set is that most are based on Map Features, which was
written in English. Anyone could create a set of keys in any other format
they wish that might for instance be using Chinese.
>And to get the discussion little more specific I made some statistics
>with some recent OSM data from an european area of about 75MB:
>From about 100'000 key-value pairs there are about 8000 distinct pairs
>and I found about 8 outliers, listed below. This is at least what came
>out perhaps of OSM REST API 0.5 (or Osmosis)?
>So, the benefit of valid attribute names costs almost nothing to clean,
>almost nothing to prevent (e.g. in editors) but let's us write nice
>applications - and I mean lot more than those you mentioned above...
>Outliers found in recent OSM data:
>'Node/Linear/Area '='Route sans Nom'
>'wdb:source'='CIA World database II - europe-bdy.txt - segment 100'
>2008/2/12, Dave Stubbs <osm.list at randomjunk.co.uk>:
> 2008/2/12 Stefan Keller <sfkeller at gmail.com>:
> > GML/XML is *not* the issue, you know that:
> > It's almost any application outside OSM database.
> > It's about reusability and consistency!
> > I love the approach of key-value pairs (and I like beers too... ;-
> > I agree with Martijn that before all, spaces must be kept out.
> > I agree too with Frederik: Colons can be included as namespace
> > Namespace, tags and keys reminds us, that OSM is a database and
> > *not* a Wiki on an island (whereas I'm loving Wikis used as they
> > So I'm sorry, guys, but I have to insist:
> > I propose distinctly to restrict key names (elemement, tag) to the
> > 'aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ_', now
> > plus colon as namespace delimiter, allowed once and not at the
> > the end.
> Even XML allows significantly more than that -- pretty much anything
> but whitespace , with a ":" as namespace delimiter.
> So insist all you like, but personally I think making people handle
> UTF-8 nicely is probably a good thing given the number of values
> will rely on it heavily anyway. Most reasonable programming
> environments have decent unicode support these days, and certainly
> every XML parser that isn't a hack.
>  http://www.w3.org/TR/2006/REC-xml-20060816/#charsets
More information about the dev