[OSM-dev] osmosis utf-8

Martijn van Oosterhout kleptog at gmail.com
Thu Nov 8 11:37:06 GMT 2007


On Nov 8, 2007 2:16 AM, Brett Henderson <brett at bretth.com> wrote:
> Is there a simple tcp proxy/tunnel application I can use to log
> connection data to file?  It might be more useful than me guessing at
> what is going on between osmosis and the database.

Maybe tcpdump, if the connection isn't encrypted...

> I've just created test-utf8.osc and test-iso-8859-1.osc in the
> http://planet.openstreetmap.org/daily
> Both are performed with a utf-8 database connection.  The output file
> encoding is changed as indicated by the file name.

Ok, this is wierd. The utf8 file has c3 83 c5 b8 and the iso-8859-1
has c3 3f. Now utf8(c3 83) = latin1(c3) so that's good. But utf8(c5
b8) is not latin1, being unicode(0x178) which is not latin1 (it's a Y
with two dots above it 'Ÿ').

I'm going to take a guess in suggesting the character is supposed to
be a 'ß', unicode(0xDF) = utf8(c3 9f). It turns out that in windows
code page 1252 the character "Y is represented by 0x9f. So we have one
or more of:

1. what mysql thinks is latin1 is not
2. ruby is connecting in a windows code page 1252
3. The recoding from the server encoding to java is wrong

In any case, case you set the file output encoding to windows cp1252
and see what happens?

Hope this helps,
-- 
Martijn van Oosterhout <kleptog at gmail.com> http://svana.org/kleptog/




More information about the dev mailing list