[OSM-dev] broken utf8 in minute changeset 200907140650
richard at systemed.net
Tue Jul 14 16:19:14 BST 2009
Ævar Arnfjörð Bjarmason wrote:
> * Potlatch will enter whatever raw binary string the user
> supplies into the database that the main API would reject
> as an invalid request, hence the corrupt data
>From a client point of view, the bug you filed is that Linux Flash Player
has long been broken beyond belief and doesn't permit non-ASCII characters
to be entered into a textfield. (See http://bugs.adobe.com/jira/browse/FP-40
This morning is actually a different issue AFAICT. Potlatch (the SWF client)
has long used an ActionScript method, textField.restrict, to prevent control
characters (0x00-0x1F) being input into textfields. Unfortunately the latest
version of Ming (the open-source Flash compiler used to compile Potlatch),
0.4.2, appears to be broken and will not compile textField.restrict
correctly - it randomly uppercases character input (letters D to U, IIRC)
which is a whole heap of no good for entering tags. (See
Consequently when I needed to commit a new revision of Potlatch at SOTM, and
only had a laptop with 0.4.2 installed, this check was temporarily removed.
It'll be back in this evening now I'm back with a machine with Ming 0.3 on
As I mentioned to you the other day, it would be really useful if some
Linux-using OSMers could expand the reports at
http://trac.openstreetmap.org/ticket/1936 so we can find exactly _how_ FP
for Linux is breaking encoding, and fix it either in Potlatch or at the API.
>From the two examples you give, for two-byte UTF8, it appears to be adding
0x03 before the first byte and 0x83 0xC2 after it. But we need to work out
whether this is a universal pattern for all two-byte UTF8 sequences, and
what happens with longer sequences. This should be fairly trivial for
someone with the Rails port installed on a Linux machine, I'd hope.
> And as has been pointed out there's an ambiguity as to what
> sequences of bytes can be written to the database whether that
> be full UTF-8 or some XML subset of it.
View this message in context: http://www.nabble.com/broken-utf8-in-minute-changeset-200907140650-tp24475713p24481719.html
Sent from the OpenStreetMap - Dev mailing list archive at Nabble.com.
More information about the dev