[OSM-dev] osmosis utf-8

Martijn van Oosterhout kleptog at gmail.com
Thu Nov 8 18:53:01 GMT 2007


On Nov 8, 2007 1:24 PM, Brett Henderson <brett at bretth.com> wrote:
> I guess Cp1252 isn't quite what mysql uses after all.  Although it seems
> like we're on the right track.  Perhaps I need to write my own encoding
> ...  I guess I need to find out what mysql truly does use for latin1.

Not so quick, you're on the right track. If you compare the two you
will see they differ by one (1!) byte, a 0x81 is converted to a
question mark. The reason probably being that 0x81 is not a valid
character in cp1252. Mysql being what it is doesn't complain.

The choices from here are a bit tricky. You can get the charset
mapping in various places. Perhaps the easiest solution would be to
set the "unmappable char" character to 0x81, if it'll let you. I'm
just worried about the other possible unrepresentable char 0xAD.

Here's the charset we're talking about:
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-5348_P100-1997&s=ALL

I'm fresh out of ideas here. Part of me says to make your own mapping
table or converter but how you'd do that within the Java framework I
have absolutly no idea.

Have a nice day,
-- 
Martijn van Oosterhout <kleptog at gmail.com> http://svana.org/kleptog/




More information about the dev mailing list