[OSM-dev] Disallowing certain characters in tag keys
bastikln at googlemail.com
Sat Oct 16 20:55:41 BST 2010
what is the problem with escaping problematic characters? There should
be build in functions for most languages, like uri_escape in Perl and
URLEncode.encode in Java.
This proposal  moves values into the key to describe conditions.
(Although you could argue, it should be done like that anyway...)
Jochen Topf wrote:
> I am currently fighting some issues where tags with strange characters in them
> need to be represented in a URL for Taginfo. Lots of other websites probably
> will have similar issues. Characters like /, ?, &, etc. have special meaning
> in URLs so if they appear in tags I can't have those tags in URLs. Sometimes
> escaping characters as %XX helps, sometimes not. And those problems are not
> confined to web pages and URLs only. Special characters that need escaping
> are often a problem.
> We can't really do anything about that with regard to tag values, they must be
> allowed to contain all those characters. But it would help at least a little if
> we knew those characters can never appear in tag keys. And I can't really see a
> legitimate reason why we need those characters in keys. Looking at the database
> almost all cases where they appear in keys are obvious errors. Out of the about
> 20000 different keys, there are only about 190 keys with problematic characters
> in them (another about 800 with whitespace). Really the only case that I can't
> immediately rule out as errors or see an alternative tagging are tag keys like
> "maxspeed:weight>7.5". And with those you can already see the problems: Some of
> them have ">" instead of the ">".
> So I'd like us to think about whether we can disallow a few characters from
> appearing in tag keys. Technically this would mean changing the API to check
> for those characters, removing any that are already in the database (can be
> done with normal manual edits because there are so few cases) and adding checks
> to the editors so that they can give meaningful error messages. Shouldn't be
> too hard.
> So, what characters am I talking about? I haven't drawn up a complete list
> and we certainly would need to discuss this further.
> Here is a preliminary list:
> Whitespace Should use '_' instead of whitespace in keys, whitespace are
> also very confusing for users, especially at beginning and end
> of a text.
> <>&/+?#;%'" Special characters in XML, HTML and/or URLs.
> \'" Characters often used for quoting.
> = Because its used in many places as the separation character
> between tag key and tag value. If we disallow this, we can always
> treat one string like "foo=bar" as k:foo, v:bar without any
> This is a small list of special characters, all other characters should still
> be allowed. That means tag keys can still be in Chinese or whatever. We'd just
> disallow a few characters of which we know that they will make problems again
> and again.
> And to emphasize this again: I am only talking about tag keys. Tag values must
> be allowed to contain the full Unicode set of characters.
More information about the dev