[OSM-dev] Disallowing certain characters in tag keys

Andreas Kalsch andreaskalsch at gmx.de
Sat Oct 16 21:31:47 BST 2010


I agree with

whitespace - this can be very confusing

"="

To add:

Make keys lowercase (or even remove diacritics), because keys are always simple names.

Am 16.10.10 20:44, schrieb Jochen Topf:
> Hi!
>
> I am currently fighting some issues where tags with strange characters in them
> need to be represented in a URL for Taginfo. Lots of other websites probably
> will have similar issues. Characters like /, ?,&, etc. have special meaning
> in URLs so if they appear in tags I can't have those tags in URLs. Sometimes
> escaping characters as %XX helps, sometimes not. And those problems are not
> confined to web pages and URLs only. Special characters that need escaping
> are often a problem.
>
> We can't really do anything about that with regard to tag values, they must be
> allowed to contain all those characters. But it would help at least a little if
> we knew those characters can never appear in tag keys. And I can't really see a
> legitimate reason why we need those characters in keys. Looking at the database
> almost all cases where they appear in keys are obvious errors. Out of the about
> 20000 different keys, there are only about 190 keys with problematic characters
> in them (another about 800 with whitespace). Really the only case that I can't
> immediately rule out as errors or see an alternative tagging are tag keys like
> "maxspeed:weight>7.5". And with those you can already see the problems: Some of
> them have ">" instead of the ">".
>
> So I'd like us to think about whether we can disallow a few characters from
> appearing in tag keys. Technically this would mean changing the API to check
> for those characters, removing any that are already in the database (can be
> done with normal manual edits because there are so few cases) and adding checks
> to the editors so that they can give meaningful error messages. Shouldn't be
> too hard.
>
> So, what characters am I talking about? I haven't drawn up a complete list
> and we certainly would need to discuss this further.
>
> Here is a preliminary list:
>
> Whitespace   Should use '_' instead of whitespace in keys, whitespace are
>               also very confusing for users, especially at beginning and end
>               of a text.
>
> <>&/+?#;%'"  Special characters in XML, HTML and/or URLs.
>
> \'"          Characters often used for quoting.
>
> =            Because its used in many places as the separation character
>               between tag key and tag value. If we disallow this, we can always
>               treat one string like "foo=bar" as k:foo, v:bar without any
>               ambiguities.
>
> This is a small list of special characters, all other characters should still
> be allowed. That means tag keys can still be in Chinese or whatever. We'd just
> disallow a few characters of which we know that they will make problems again
> and again.
>
> And to emphasize this again: I am only talking about tag keys. Tag values must
> be allowed to contain the full Unicode set of characters.
>
> Jochen




More information about the dev mailing list