[OSM-dev] Restrict key names on order to retain reusability of OSM

Wed Feb 13 04:23:33 GMT 2008

Stefan Keller skrev:
> You are right that XML names (= keys/tags) are valid in unicode
> in which case the encoding of the whole XML document (exchange file)
> must support this.
> 
> But you know well that many tools have problems with non-ASCII XML
> element and attribute names (for content/value UTF-8 is ok since
> chars can escaped)!
> 
> So, my last 20cents for valid key names before I give up is the following:
> 'aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ_-.0123456789'
> whereas such qualified names must begin with a letter and contain at
> most one colon and have at most a length of 255.

Stefan, if I am coming across in this message as a bit harsh, then 
you're not mistaken - I am a grumpy old man, and damn proud of it. Just 
remember, it's not personal. I try to go after the ball, not the man. 
(No FakeSteveC, it doesn't mean I try to go after a guys ball(s) in THAT 
way..)

Three times you have posted that you want to limit the characters used 
in tag naming, revising your proposal first to include the colon, and 
now to include numbers. Each previous attempt you have been told that 
UTF8 is valid, for good reasons, and yet still you persist.

You have not once given a valid TECHNICAL reason for such a change, 
WITHIN THE SCOPE OF OSM, for limiting the characters allowable in tag 
names.

As far as I can see from your first message on this subject, your idea 
stems from converting OSM data from its XML format to GML.

Your project might need GML, OSM doesn't.

If you are in the need of GML compliant output, then it is your task to 
massage the OSM provided data into a GML compliant output. It is not the 
task of OSM to have the data in GML compliant format, since the XML 
format with UTF8 as allowable just plain works for OSM.

The tools that you state have problems with non-ascii characters should 
be fixed to be able to handle the UTF8 characters. Not the other way 
around, by changing the dataset to comply with the requirements of the 
tools.

You might think it's a hen and egg situation, although in this case, the 
egg definitely is the important part, and has priority.
The egg (the data) in this case has attributes that can contain 
non-ascii characters, thereby allowing non-latin based nationalities to 
define their own tags in their own language. This is a GOOD thing, which 
should NOT be changed. The hen (tools and programs utilizing OSM data) 
must take this into account. If a tool can't do that, then the farmer 
(the user of that tool) have to either change that tool, or use the egg 
to prepare a dish that the tool can digest (massage the OSM data into a 
format the tool can use). The farmer should not try to persuade the egg 
that it is better of as a watermelon.

So to recap: The current allowable characters in OSM tag names is UTF8 - 
Deal with it, instead of trying to impose limitations into OSM to make 
OSM data comply with YOUR requirements.

Dutch