[OSM-dev] Restrict key names on order to retain reusability of OSM
J.D. Schmidt
jdsmobile at gmail.com
Wed Feb 13 04:23:33 GMT 2008
Stefan Keller skrev:
> You are right that XML names (= keys/tags) are valid in unicode
> in which case the encoding of the whole XML document (exchange file)
> must support this.
>
> But you know well that many tools have problems with non-ASCII XML
> element and attribute names (for content/value UTF-8 is ok since
> chars can escaped)!
>
> So, my last 20cents for valid key names before I give up is the following:
> 'aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ_-.0123456789'
> whereas such qualified names must begin with a letter and contain at
> most one colon and have at most a length of 255.
Stefan, if I am coming across in this message as a bit harsh, then
you're not mistaken - I am a grumpy old man, and damn proud of it. Just
remember, it's not personal. I try to go after the ball, not the man.
(No FakeSteveC, it doesn't mean I try to go after a guys ball(s) in THAT
way..)
Three times you have posted that you want to limit the characters used
in tag naming, revising your proposal first to include the colon, and
now to include numbers. Each previous attempt you have been told that
UTF8 is valid, for good reasons, and yet still you persist.
You have not once given a valid TECHNICAL reason for such a change,
WITHIN THE SCOPE OF OSM, for limiting the characters allowable in tag
names.
As far as I can see from your first message on this subject, your idea
stems from converting OSM data from its XML format to GML.
Your project might need GML, OSM doesn't.
If you are in the need of GML compliant output, then it is your task to
massage the OSM provided data into a GML compliant output. It is not the
task of OSM to have the data in GML compliant format, since the XML
format with UTF8 as allowable just plain works for OSM.
The tools that you state have problems with non-ascii characters should
be fixed to be able to handle the UTF8 characters. Not the other way
around, by changing the dataset to comply with the requirements of the
tools.
You might think it's a hen and egg situation, although in this case, the
egg definitely is the important part, and has priority.
The egg (the data) in this case has attributes that can contain
non-ascii characters, thereby allowing non-latin based nationalities to
define their own tags in their own language. This is a GOOD thing, which
should NOT be changed. The hen (tools and programs utilizing OSM data)
must take this into account. If a tool can't do that, then the farmer
(the user of that tool) have to either change that tool, or use the egg
to prepare a dish that the tool can digest (massage the OSM data into a
format the tool can use). The farmer should not try to persuade the egg
that it is better of as a watermelon.
So to recap: The current allowable characters in OSM tag names is UTF8 -
Deal with it, instead of trying to impose limitations into OSM to make
OSM data comply with YOUR requirements.
Dutch
More information about the dev
mailing list