[OSM-dev] Disallowing certain characters in tag keys
jochen at remote.org
Sat Oct 16 22:02:25 BST 2010
On Sat, Oct 16, 2010 at 09:55:41PM +0200, Sebastian Klein wrote:
> what is the problem with escaping problematic characters? There should
> be build in functions for most languages, like uri_escape in Perl and
> URLEncode.encode in Java.
The problem is that its very hard to get this right. I have just spend an hour
debugging a problem where the semi-colon (;) character in a URL was mis-handled
by Apache. The mod_proxy module I use in Taginfo silently cuts of everything
after an ; in the URL even if its escaped. Thats probably because the ; is
handled specially for some reason in the RFC defining URLs. I found an option
to fix this, but its doesn't work with mod_rewrite, so that had to be worked
around, too. I got it to work, but I don't want to know what the next problem
will be. It just goes to show that even software like Apache has problems
dealing with these things, not to speak of some scripts somebody just hacked
I already mentioned the keys in the database with ">" or so in there,
probably from some software escaping once too often. Special characters must
be escaped exactly once. If they are not escaped or escaped more than once, you
get broken results. And on the other side you have to de-escape exactly once.
This is difficult to get right. And the penalty for not getting it right might
just be an SQL injection or a cross-site-scripting attack vector.
Not allowing those characters in the first place, makes software easier to
write and understand and more robust. And it even makes it friendlier for
humans, because if they use those characters you can immediately give them an
error message instead of creating broken data.
And all of that without any cost, really. I don't see that we ever need those
characters in tag keys. Of course if we do need those characters than we have
to get all of this right and right every time.
> This proposal  moves values into the key to describe conditions.
> (Although you could argue, it should be done like that anyway...)
I think thats a misguided use of tag keys probably invented by people who have
never actually tried to write code that tries to interpret OSM tags.
Jochen Topf jochen at remote.org http://www.remote.org/jochen/ +49-721-388298
More information about the dev