[OSM-dev] osmosis utf-8

Tue Nov 6 22:01:46 GMT 2007

Brett Henderson wrote: 
> From what I can tell, setting the connection set as suggested by Tom 
> doesn't work. I've tried setting all manner of connection properties 
> to UTF-8 and ISO8859_1 (latin1). It just seemed like it was 
> configuring both the server and client side with this setting. I 
> couldn't find a way to get the server to send in one encoding and the 
> client read in another, I think the JDBC driver tells the server to 
> change if the client changes. When using a Hebrew encoding, I ended up 
> with ? characters as if the server wasn't able to encode it and wrote 
> ? which the client then read. But without sniffing the connection it's 
> hard to tell exactly what's going on.
>
> Martijn, I can try your trick and suspect it may work but it's going 
> to be a lot of coding effort due to jdbc string reads occurring all 
> over the shop in osmosis. If the MySQL latin1 differs from java 
> ISO8859_1 then I'm screwed anyway. I'm also concerned that it will 
> only work for characters that fit into the latin1 encoding, I wonder 
> what would happen to other characters such as chinese.
>
> How hard is it to fix the main db? Does it just require a dump and 
> restore or is more substantial surgery required? Or don't we know yet?
>
> Cheers,
> Brett
Last night I also tried running the "set names 'latin1'" statement from 
code.  The JDBC driver documentation explicitly says not to use this 
command because the driver won't detect the change.  Sounded perfect but 
again seemed to make no difference which surprised me.

Specifically, I set the characterEncoding property to "UTF-8", and then 
executed "set names 'latin1'" from code.  I also switched the two 
arguments around setting the characterEncoding to ISO8859_1 and running 
"set names 'utf8'" from code.

To make sure it was doing something, I ran an invalid statement "set 
names 'latin1x'" which caused osmosis to crash.

Given that this is a significant flaw in the current production setup, 
surely it makes sense to fix it rather than constantly work around it?  
I'm willing to help out if necessary, I'd rather fix this problem at the 
source than spend any more time trying to work around the problem in code.

Do I need to get rails running locally and duplicate the production 
problem then create a process to fix it?  Or is the fix already known?

Cheers,
Brett