[OSM-dev] Disallowing certain characters in tag keys

Jochen Topf jochen at remote.org
Sat Oct 16 21:42:11 BST 2010

On Sat, Oct 16, 2010 at 10:31:47PM +0200, Andreas Kalsch wrote:
> I agree with
> whitespace - this can be very confusing
> "="
> To add:
> Make keys lowercase (or even remove diacritics), because keys are always simple names.

Thats a different issue. I agree that keys normally should be lowercase. But thats
just a nice convention. There might be good reasons for uppercase keys, for instance
when the key name was used in upper case in some other system where data was imported

There is no need to force people into a convention here. Thats different from the issue
I have been talking about where there are real problems with some characters.


> Am 16.10.10 20:44, schrieb Jochen Topf:
>> Hi!
>> I am currently fighting some issues where tags with strange characters in them
>> need to be represented in a URL for Taginfo. Lots of other websites probably
>> will have similar issues. Characters like /, ?,&, etc. have special meaning
>> in URLs so if they appear in tags I can't have those tags in URLs. Sometimes
>> escaping characters as %XX helps, sometimes not. And those problems are not
>> confined to web pages and URLs only. Special characters that need escaping
>> are often a problem.
>> We can't really do anything about that with regard to tag values, they must be
>> allowed to contain all those characters. But it would help at least a little if
>> we knew those characters can never appear in tag keys. And I can't really see a
>> legitimate reason why we need those characters in keys. Looking at the database
>> almost all cases where they appear in keys are obvious errors. Out of the about
>> 20000 different keys, there are only about 190 keys with problematic characters
>> in them (another about 800 with whitespace). Really the only case that I can't
>> immediately rule out as errors or see an alternative tagging are tag keys like
>> "maxspeed:weight>7.5". And with those you can already see the problems: Some of
>> them have ">" instead of the ">".
>> So I'd like us to think about whether we can disallow a few characters from
>> appearing in tag keys. Technically this would mean changing the API to check
>> for those characters, removing any that are already in the database (can be
>> done with normal manual edits because there are so few cases) and adding checks
>> to the editors so that they can give meaningful error messages. Shouldn't be
>> too hard.
>> So, what characters am I talking about? I haven't drawn up a complete list
>> and we certainly would need to discuss this further.
>> Here is a preliminary list:
>> Whitespace   Should use '_' instead of whitespace in keys, whitespace are
>>               also very confusing for users, especially at beginning and end
>>               of a text.
>> <>&/+?#;%'"  Special characters in XML, HTML and/or URLs.
>> \'"          Characters often used for quoting.
>> =            Because its used in many places as the separation character
>>               between tag key and tag value. If we disallow this, we can always
>>               treat one string like "foo=bar" as k:foo, v:bar without any
>>               ambiguities.
>> This is a small list of special characters, all other characters should still
>> be allowed. That means tag keys can still be in Chinese or whatever. We'd just
>> disallow a few characters of which we know that they will make problems again
>> and again.
>> And to emphasize this again: I am only talking about tag keys. Tag values must
>> be allowed to contain the full Unicode set of characters.
>> Jochen
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/dev

Jochen Topf  jochen at remote.org  http://www.remote.org/jochen/  +49-721-388298

More information about the dev mailing list