[OSM-dev] REST API encoding

Thomas Walraet thomas at walraet.com
Wed Jul 12 17:17:59 BST 2006

Dean Earley a écrit :
>> Here is the reply for single segment :
>>>     <tag k="name" v="t&#xE9;st"/>
> <SNIP>
>> For single way :
>>>     <tag k="name" v="t&#xE9;st"/>
> <SNIP>
>> A9 is the hex code for é in ISO-8859-1
>> C3A9 is the hex code for é UTF-8
> Technically, these are neither.
> It happens to be hex encoded representation of the aforementioned encoding
> formats. Saying "it is 8859-1" or "it is utf-8" means it will be that
> format raw (not hex encoded).

I think I understood that. It means that the server is only sending 
ASCII chars and that clients don't have to specified an encoding when 
reading the stream from the server (actually JOSM use an 
InputStreamReader without encoding (I proposed a patch to Imi but he 
wisely didn't apply it), and the applet use an Apache component 
configured to read ISO-8859-1 stream)

The server just have to encode string from it's internal system to 
unicode entities (&#xXX; things), and clients have to decode them.

If this is correct, I had to remove what I added to the REST page (the 
thing about using <?xml version='1.0' encoding='UTF-8'?> header for 
server response)

For client request, JOSM and the applet actually use UTF-8 encoding, and 
it seems to work (except for way's tags that the server serve back wrongly).
Does this behavior is considered OK, or is it better if we switch to 
&#XX; things to encode characters outside ASCII ?

More information about the dev mailing list