[OSM-dev] What subset of UTF-8 should the API accept?

Thu Jul 16 19:07:17 BST 2009

The OSM protocol specifies that it accepts UTF-8 data, but in reality
it only accepts the subset of UTF-8 that the XML parser being used
doesn't barf on, see:

http://lists.openstreetmap.org/pipermail/dev/2009-July/016165.html

This issue surfaces e.g. here:

http://trac.openstreetmap.org/ticket/2072

So what subset should the API specify? If it's to accept full UTF-8
all the tools that parse the XML will have to learn to deal with
control characters.