[OSM-dev] Disallowing certain characters in tag keys

Jochen Topf jochen at remote.org
Tue Oct 19 09:22:48 BST 2010


On Sun, Oct 17, 2010 at 04:57:33PM -0400, Anthony wrote:
> On Sat, Oct 16, 2010 at 2:44 PM, Jochen Topf <jochen at remote.org> wrote:
> > Technically this would mean changing the API to check
> > for those characters, removing any that are already in the database (can be
> > done with normal manual edits because there are so few cases) and adding checks
> > to the editors so that they can give meaningful error messages.
> 
> To be clear, they'd still be in the database, in the history.
> 
> Which is one implementation problem, because it means putting checks
> in more than one different place.  At the very least, the regular API,
> and the Potlatch API, but there are probably multiple places within
> the regular API where things would need to be checked.

But thats much fewer places than all the software out there. The whole point
of an API is that its a sort of "choke point", a single place where things
can be checked.

> And then any software which relies on these changes wouldn't work with
> historical data.

Thats a problem, you are right. We could solve that by faking the history. Not
the first time this has been done, it would be possible. But most software out
there only deals with current data. So even if we keep the history, that
software would be made easier.

> It could be done, but to do all that work just to make it easier to
> code Taginfo would be, in my opinion, a waste.  Especially when there
> are plenty of simple solutions within taginfo.  If URL encoding is too
> painful, use a modified base64 encoding of the unicode string (using
> "-" and "_" instead of "+" and "/").

Its not only Taginfo. Every software out there would be made easier. If this
would be a Taginfo-only problem I wouldn't propose it. One of the biggest
problems is that Taginfo doesn't work alone, but wants to work with other
tools. If I use base64 encoding then people would need to link to something
like http://taginfo.openstreetmap.de/keys/aGlnaHdheQo= instead of
http://taginfo.openstreetmap.de/keys/highway. And the link then to XAPI
would not be http://www.informationfreeway.org/api/0.6/*[highway=*] but
http://www.informationfreeway.org/api/0.6/*[aGlnaHdheQo==*] . Not very user
friendly. And then every service would probably use different encoding
schemes...

I have actually thought about that and might offer a secondary interface
to Taginfo using base64 or something like it if I can't avoid it. But thats
really ugly and probably nobody would use it anyway, because nobody wants
to write special cases for the few keys that use those characters and are
bogus anyway.

> For cleaning up the keys, I'd want to strip down to as few characters
> as possible.  There's no point supporting most unicode characters -
> keys are supposed to be in English.

No. English people should be allowed to use their own language if they
want to. So should speakers of every other language on the planet, too.

Jochen
-- 
Jochen Topf  jochen at remote.org  http://www.remote.org/jochen/  +49-721-388298




More information about the dev mailing list