[OSM-dev] UTF8 problem with last night's daily .osc

Andy Allan gravitystorm at gmail.com
Sun Aug 31 16:10:34 BST 2008


On Sun, Aug 31, 2008 at 1:08 AM, Grant Slater
<openstreetmap at firefishy.com> wrote:
> Karl Newman wrote:
>> If I recall correctly, the database column is not actually set for
>> UTF-8 (but is double-encoded to return actual UTF-8 to the client...).
>> Wouldn't it be a better long-term fix to change the database to UTF-8
>> (or whatever), then presumably MySql wouldn't allow invalid sequences
>> to be stored? Still would be a good idea to raise an error if the
>> length was too long, though.
>>
>
> Yes, this is the case.

For anyone who is following this at home, the column *is* actually
UTF-8, but it's true that it's double-encoded. So mysql currently is
behaving properly in not splitting multi-byte characters - but what it
sees as the 256th character might actually be the doubly-encoded
second half of a wide character. If you know what I mean.

And as TomH explained yesterday after the OSMF AGM, sorting it out
involves dumping and reloading the entire db, so it'll most likely
wait until all the 0.6 migrations happen anyway. But I'd be reluctant
to utter 'once and for all' :-)

Cheers,
Andy




More information about the dev mailing list