[OSM-dev] Restrict key names on order to retain reusability of OSM

Stefan Keller sfkeller at gmail.com
Tue Feb 12 13:27:24 GMT 2008

Thanks for the pointer o XML, Dave.

UTF-8 is a good choice for content, but this is about *keys* (i.e.
Keys correspond to XML elements which are defind as names [1](!)
... which nicely fits the definition I proposed.

And to get the discussion little more specific I made some statistics
with some recent OSM data from an european area of about 75MB:
>From about 100'000 key-value pairs there are about 8000 distinct pairs
and I found about 8 outliers, listed below. This is at least what came
out perhaps of OSM REST API 0.5 (or Osmosis)?

So, the benefit of valid attribute names costs almost nothing to clean,
almost nothing to prevent (e.g. in editors) but let's us write nice
applications - and I mean lot more than those you mentioned above...


[1] http://www.w3.org/TR/2006/REC-xml-20060816/#sec-common-syn

Outliers found in recent OSM data:
'Node/Linear/Area  '='Route sans Nom'
'Tunnel '='yes'
'wdb:source'='CIA World database II - europe-bdy.txt - segment 100'

2008/2/12, Dave Stubbs <osm.list at randomjunk.co.uk>:
> 2008/2/12 Stefan Keller <sfkeller at gmail.com>:
> > GML/XML is *not* the issue, you know that:
> > It's almost any application outside OSM database.
> > It's about reusability and consistency!
> >
> > I love the approach of key-value pairs (and I like beers too... ;->).
> > I agree with Martijn that before all, spaces must be kept out.
> > I agree too with Frederik: Colons can be included as namespace
> delimiters.
> > Namespace, tags and keys reminds us, that OSM is a database and
> > *not* a Wiki on an island (whereas I'm loving Wikis used as they are)!
> >
> >
> > So I'm sorry, guys, but I have to insist:
> > I propose distinctly to restrict key names (elemement, tag) to the set
> > 'aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ_', now
> > plus colon as namespace delimiter, allowed once and not at the beginning
> or
> > the end.
> Even XML allows significantly more than that -- pretty much anything
> but whitespace [1], with a ":" as namespace delimiter.
> So insist all you like, but personally I think making people handle
> UTF-8 nicely is probably a good thing given the number of values that
> will rely on it heavily anyway. Most reasonable programming
> environments have decent unicode support these days, and certainly
> every XML parser that isn't a hack.
> Dave
> [1] http://www.w3.org/TR/2006/REC-xml-20060816/#charsets
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20080212/48305931/attachment.html>

More information about the dev mailing list