[OSM-dev] Unicode Errors On Insert

Brett Henderson brett at bretth.com
Wed Jul 25 12:45:20 BST 2007


I've fixed the problem, my tables on the windows machine were actually 
latin1 even though the server was set to utf8.

I created the database schema on the windows machine using my script at:
http://www.bretth.com/osmosis/osm_schema_latest.sql

The script was created using mysqldump from my linux schema which was 
originally created using the ruby rake tool.  The script explicitly sets 
the table type to latin1.  This means that although the server reports 
itself as utf8, the tables are actually latin1.  The utf8 which would 
otherwise have been converted to latin1 remains as utf8 which fails on 
insert.  Modifying the script to create tables as utf8 has fixed the 
problem.

Brett Henderson wrote:
> I'm having trouble inserting non-ASCII data into an OSM MySQL schema. 
> I think the characters are legal. The insert statement is attached, 
> I've made it an attachment in case of email conversion errors.
>
> I have two MySQL databases, one on Linux, the other on Windows. Linux 
> is configured with the latin1 character set, Windows with utf8.
>
> The "status" command on the Linux db returns this info:
> Server version: 5.0.27
> Protocol version: 10
> Connection: Localhost via UNIX socket
> Server characterset: latin1
> Db characterset: latin1
> Client characterset: latin1
> Conn. characterset: latin1
>
> "status" on the window db returns this info:
> Server version: 5.0.45-community-nt MySQL Community Edition (GPL)
> Protocol version: 10
> Connection: localhost via TCP/IP
> Server characterset: utf8
> Db characterset: utf8
> Client characterset: utf8
> Conn. characterset: utf8
>
> Inserting into the Linux db succeeds, but I appear to be losing data. 
> Extended characters are replaced with "?".
> "name=Sta?e?;place=village;created_by=JOSM"
>
> Inserting into the Windows db fails with the following error message.
> Incorrect string value: '\xC5\x99e\xC4\x8D;...' for column 'tags' at 
> row 1
>
> The data in question is:
> "name=Stařeč;place=village;created_by=JOSM"
> (check the attachment if this doesn't show up in email).
>
> Does anybody have any idea how the db should be configured? I assume 
> that utf8 is preferred, but if so why is this insert failing?






More information about the dev mailing list