[OSM-dev] Disallowing certain characters in tag keys
Sebastian Klein
bastikln at googlemail.com
Sat Oct 16 20:55:41 BST 2010
Hi,
what is the problem with escaping problematic characters? There should
be build in functions for most languages, like uri_escape in Perl and
URLEncode.encode in Java.
This proposal [1] moves values into the key to describe conditions.
(Although you could argue, it should be done like that anyway...)
[1]
<http://wiki.openstreetmap.org/wiki/Proposed_features/Extended_conditions_for_access_tags>,
Sebastian
Jochen Topf wrote:
> Hi!
>
> I am currently fighting some issues where tags with strange characters in them
> need to be represented in a URL for Taginfo. Lots of other websites probably
> will have similar issues. Characters like /, ?, &, etc. have special meaning
> in URLs so if they appear in tags I can't have those tags in URLs. Sometimes
> escaping characters as %XX helps, sometimes not. And those problems are not
> confined to web pages and URLs only. Special characters that need escaping
> are often a problem.
>
> We can't really do anything about that with regard to tag values, they must be
> allowed to contain all those characters. But it would help at least a little if
> we knew those characters can never appear in tag keys. And I can't really see a
> legitimate reason why we need those characters in keys. Looking at the database
> almost all cases where they appear in keys are obvious errors. Out of the about
> 20000 different keys, there are only about 190 keys with problematic characters
> in them (another about 800 with whitespace). Really the only case that I can't
> immediately rule out as errors or see an alternative tagging are tag keys like
> "maxspeed:weight>7.5". And with those you can already see the problems: Some of
> them have ">" instead of the ">".
>
> So I'd like us to think about whether we can disallow a few characters from
> appearing in tag keys. Technically this would mean changing the API to check
> for those characters, removing any that are already in the database (can be
> done with normal manual edits because there are so few cases) and adding checks
> to the editors so that they can give meaningful error messages. Shouldn't be
> too hard.
>
> So, what characters am I talking about? I haven't drawn up a complete list
> and we certainly would need to discuss this further.
>
> Here is a preliminary list:
>
> Whitespace Should use '_' instead of whitespace in keys, whitespace are
> also very confusing for users, especially at beginning and end
> of a text.
>
> <>&/+?#;%'" Special characters in XML, HTML and/or URLs.
>
> \'" Characters often used for quoting.
>
> = Because its used in many places as the separation character
> between tag key and tag value. If we disallow this, we can always
> treat one string like "foo=bar" as k:foo, v:bar without any
> ambiguities.
>
> This is a small list of special characters, all other characters should still
> be allowed. That means tag keys can still be in Chinese or whatever. We'd just
> disallow a few characters of which we know that they will make problems again
> and again.
>
> And to emphasize this again: I am only talking about tag keys. Tag values must
> be allowed to contain the full Unicode set of characters.
>
> Jochen
More information about the dev
mailing list