[OSM-dev] Non-ASCII characters in XML generated from PostGIS

Andy Allan gravitystorm at gmail.com
Fri May 18 08:48:44 BST 2012


On 17 May 2012 20:51, Nick Whitelegg <Nick.Whitelegg at solent.ac.uk> wrote:
> Hi,
>
> I'm having some problems with generating XML from a postgis database from
> PHP on the Freemap server:
>
> http://www.free-map.org.uk/0.6/ws//bsvr.php?bbox=440000.0,110000.0,445000.0,115000.0&poi=place,amenity,natural&annotation=1&inProj=27700&outProj=epsg:4326
>
> It's basically falling over on the French acute 'e' accent on one of the
> points of interest. This is the first time I've had this problem.
> I specify the encoding as 'iso-8859-1' in the <?xml?> prolog which I thought
> was the way to deal with this, but no luck. I'm guessing therefore it's an
> issue with the way that PHP and/or Postgres are set
> up.
>
> On the client side I get an XML parsing error with either Firefox or the XML
> parser in Android.
>
> In the database Cafe (acute e) is encoded as "Caf<C3><E9>". It's postgis
> 1.5.1, postgres 8.4 and PHP 5.3.3. I'm guessing several other people have
> had this issue... is anyone able to offer any pointers?

Hi Nick,

There's a few issues here that are colliding.

First of all, you're returning HTML entities in an XML document.
That's going to throw an error. In your example, you have à ,
which isn't defined for XML, only for HTML.

http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

Secondly, you're getting your character encodings in a muddle. The
database is storing your e-acute in UTF8. It's unicode character
U+0039, and represented in the UTF-8 encoding as two bytes (C3 and
E9).

http://www.fileformat.info/info/unicode/char/e9/index.htm <- look at
the UTF-8 (hex) section

I don't understand why you're trying to force the UTF-8 characters
into an ISO-8859-1 encoding - which has barely enough code points to
cover Western European languages, never mind Greek, Russian or any
other OSM data. Stick to UTF-8 encoding in your XML, and you should be
fine. Provided, of course, you can persuade PHP that's it's an XML
rather than an HTML document, to sort out the entity encoding as I
explained above.

Cheers,
Andy



More information about the dev mailing list