<div>Thanks for the pointer o XML, Dave.</div>
<div> </div>
<div>UTF-8 is a good choice for content, but this is about *keys* (i.e. attributes).<br>Keys correspond to XML elements which are defind as names [1](!)</div>
<div>... which nicely fits the definition I proposed.</div>
<div> </div>
<div>And to get the discussion little more specific I made some statistics </div>
<div>with some recent OSM data from an european area of about 75MB:</div>
<div>From about 100'000 key-value pairs there are about 8000 distinct pairs</div>
<div>and I found about 8 outliers, listed below. This is at least what came </div>
<div>out perhaps of OSM REST API 0.5 (or Osmosis)?</div>
<div> </div>
<div>So, the benefit of valid attribute names costs almost nothing to clean, </div>
<div>almost nothing to prevent (e.g. in editors) but let's us write nice </div>
<div>applications - and I mean lot more than those you mentioned above...</div>
<div> </div>
<div>Stefan</div>
<div> </div>
<div>[1] <a href="http://www.w3.org/TR/2006/REC-xml-20060816/#sec-common-syn">http://www.w3.org/TR/2006/REC-xml-20060816/#sec-common-syn</a><br> </div>
<div>Outliers found in recent OSM data:</div>
<div>'Node/Linear/Area '='Route sans Nom'<br>'Tunnel '='yes'<br>'opm:capacity'='2'<br>'wdb:source'='CIA World database II - europe-bdy.txt - segment 100'<br>
'whc:criteria'='(ii)(iv)'<br>'whc:id'='268'<br>'whc:inscription_date'='1983'<br>'¨name'='Südstrasse'<br> </div>
<div> </div>
<div><span class="gmail_quote">2008/2/12, Dave Stubbs <<a href="mailto:osm.list@randomjunk.co.uk">osm.list@randomjunk.co.uk</a>>:</span>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">2008/2/12 Stefan Keller <<a href="mailto:sfkeller@gmail.com">sfkeller@gmail.com</a>>:<br>> GML/XML is *not* the issue, you know that:<br>
> It's almost any application outside OSM database.<br>> It's about reusability and consistency!<br>><br>> I love the approach of key-value pairs (and I like beers too... ;->).<br>> I agree with Martijn that before all, spaces must be kept out.<br>
> I agree too with Frederik: Colons can be included as namespace delimiters.<br>> Namespace, tags and keys reminds us, that OSM is a database and<br>> *not* a Wiki on an island (whereas I'm loving Wikis used as they are)!<br>
><br>><br>> So I'm sorry, guys, but I have to insist:<br>> I propose distinctly to restrict key names (elemement, tag) to the set<br>> 'aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ_', now<br>
> plus colon as namespace delimiter, allowed once and not at the beginning or<br>> the end.<br><br><br>Even XML allows significantly more than that -- pretty much anything<br>but whitespace [1], with a ":" as namespace delimiter.<br>
So insist all you like, but personally I think making people handle<br>UTF-8 nicely is probably a good thing given the number of values that<br>will rely on it heavily anyway. Most reasonable programming<br>environments have decent unicode support these days, and certainly<br>
every XML parser that isn't a hack.<br><br>Dave<br><br>[1] <a href="http://www.w3.org/TR/2006/REC-xml-20060816/#charsets">http://www.w3.org/TR/2006/REC-xml-20060816/#charsets</a><br></blockquote></div><br>