[OSM-dev] What subset of UTF-8 should the API accept?

Andy Allan gravitystorm at gmail.com
Thu Jul 16 19:52:10 BST 2009

On Thu, Jul 16, 2009 at 7:07 PM, Ævar Arnfjörð
Bjarmason<avarab at gmail.com> wrote:
> The OSM protocol specifies that it accepts UTF-8 data, but in reality
> it only accepts the subset of UTF-8 that the XML parser being used
> doesn't barf on, see:
> http://lists.openstreetmap.org/pipermail/dev/2009-July/016165.html
> This issue surfaces e.g. here:
> http://trac.openstreetmap.org/ticket/2072
> So what subset should the API specify?

Umm, I put that in the thread you just linked to. See

> If it's to accept full UTF-8
> all the tools that parse the XML will have to learn to deal with
> control characters.

Which, by definition, isn't possible in XML.


More information about the dev mailing list