[OSM-dev] Disallowing certain characters in tag keys

Sebastian Klein bastikln at googlemail.com
Sat Oct 16 20:55:41 BST 2010


Hi,

what is the problem with escaping problematic characters? There should 
be build in functions for most languages, like uri_escape in Perl and 
URLEncode.encode in Java.

This proposal [1] moves values into the key to describe conditions. 
(Although you could argue, it should be done like that anyway...)
[1] 
<http://wiki.openstreetmap.org/wiki/Proposed_features/Extended_conditions_for_access_tags>,


Sebastian


Jochen Topf wrote:
> Hi!
> 
> I am currently fighting some issues where tags with strange characters in them
> need to be represented in a URL for Taginfo. Lots of other websites probably
> will have similar issues. Characters like /, ?, &, etc. have special meaning
> in URLs so if they appear in tags I can't have those tags in URLs. Sometimes
> escaping characters as %XX helps, sometimes not. And those problems are not
> confined to web pages and URLs only. Special characters that need escaping
> are often a problem.
> 
> We can't really do anything about that with regard to tag values, they must be
> allowed to contain all those characters. But it would help at least a little if
> we knew those characters can never appear in tag keys. And I can't really see a
> legitimate reason why we need those characters in keys. Looking at the database
> almost all cases where they appear in keys are obvious errors. Out of the about
> 20000 different keys, there are only about 190 keys with problematic characters
> in them (another about 800 with whitespace). Really the only case that I can't
> immediately rule out as errors or see an alternative tagging are tag keys like
> "maxspeed:weight>7.5". And with those you can already see the problems: Some of
> them have ">" instead of the ">".
> 
> So I'd like us to think about whether we can disallow a few characters from
> appearing in tag keys. Technically this would mean changing the API to check
> for those characters, removing any that are already in the database (can be
> done with normal manual edits because there are so few cases) and adding checks
> to the editors so that they can give meaningful error messages. Shouldn't be
> too hard.
> 
> So, what characters am I talking about? I haven't drawn up a complete list
> and we certainly would need to discuss this further.
> 
> Here is a preliminary list:
> 
> Whitespace   Should use '_' instead of whitespace in keys, whitespace are
>              also very confusing for users, especially at beginning and end
>              of a text.
> 
> <>&/+?#;%'"  Special characters in XML, HTML and/or URLs.
> 
> \'"          Characters often used for quoting.
> 
> =            Because its used in many places as the separation character
>              between tag key and tag value. If we disallow this, we can always
>              treat one string like "foo=bar" as k:foo, v:bar without any
>              ambiguities.
> 
> This is a small list of special characters, all other characters should still
> be allowed. That means tag keys can still be in Chinese or whatever. We'd just
> disallow a few characters of which we know that they will make problems again
> and again.
> 
> And to emphasize this again: I am only talking about tag keys. Tag values must
> be allowed to contain the full Unicode set of characters.
> 
> Jochen




More information about the dev mailing list