[OSM-dev] Restrict key names on order to retain reusability of OSM

Robert (Jamie) Munro rjmunro at arjam.net
Tue Feb 12 15:25:50 GMT 2008

Hash: SHA1

Stefan Keller wrote:
| Hi all,
| I just have finished a converter of OSM xml format to GML and I *BOLDLY*
| suggest to constrain the allowed characters of tags (= key-names) to the
| following XML related set:
| 'aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ_' in order to
| retain reusability.
| After having looked at more than 100 MB of data we found in key names
| characters like space, slashes, colons and even more weird ones. I don't
| think this will take too much of users freedom of choice...
| What do you think to agree on such a character list and subsequenctly to
| build this into editors like JOSM on order to get clean key names from
| the beginning?

If the system won't pass-through UTF8 cleanly, encode the strings with
either UTF7 or Punycode. We should probably decide which one people prefer.

For example,
String:   Zürich
punycode: xn--Zrich-kva
UTF-7:    Z+APw-rich

UTF-7 is probably simpler to understand.
Punycode is really wierd, but has been adopted for domain names.
Punicode is probably more compact with strings with more non-ascii
characters in it (but it is longer in the above example).

Which would people prefer to see used to allow OSM data to be round
tripped outside Unicode (or binary 8-bit clean) environments?

Robert (Jamie) Munro
Version: GnuPG v1.4.6 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the dev mailing list