[OSM-dev] broken utf8 in minute changeset 200907140650
Richard Fairhurst
richard at systemed.net
Thu Jul 16 12:29:35 BST 2009
Tom Hughes wrote:
> Your test app says, for Ä:
>
> Flash stores: c3 20 1e
> Server received: C3 83 E2 80 9E
>
> Now Ä is 0xc4 in 8859-1 which encodes to UTF-8 as 0xc3 0x84 so
> something is going wrong at the first stage still.
Right.
0x201E is Unicode for „ ("double low-9 quotation mark").
„ is 0x84 in ye olde traditional Windows character set (CP1252).
The proper UTF8 is 0xC3 0x84.
Similarly, Roland (from earlier e-mail) is getting 0xC3 0x2013 when typing
Ö.
0x2013 is Unicode for – (en dash).
– is 0x96 in CP1252.
The proper UTF8 is 0xC3 0x96.
So if I read this right, Flash is sometimes _triple_-encoding characters. It
gets Ä, encodes it as UTF8 (0xC3 0x84), then encodes the 0x84 again as
Unicode (0x201E). All of this is within the textField. Then, of course, it
encodes the lot again when transmitting to the server.
I hope Adobe's programmers have a 24-hour armed guard.
cheers
Richard
--
View this message in context: http://www.nabble.com/broken-utf8-in-minute-changeset-200907140650-tp24475713p24514695.html
Sent from the OpenStreetMap - Dev mailing list archive at Nabble.com.
More information about the dev
mailing list