[OSM-dev] osmosis utf-8

Tom Hughes tom at compton.nu
Thu Nov 8 14:03:07 GMT 2007


In message <4732FA3B.2010608 at bretth.com>
        Brett Henderson <brett at bretth.com> wrote:

> That lines up with what Tom was saying about MySQL using a 
> windows-1252-like encoding.  I'm feeling a little silly, I tried to find 
> the name of the 1252 encoding yesterday to try it out and came to the 
> conclusion java didn't support it, I was wrong (not sure why I didn't 
> see it ...).  I might have fixed this sooner.
>
> Check out:
> http://planet.openstreetmap.org/daily/test-cp1252.osc
>
> Unless I'm mistaken that's the required output!  Thanks for your 
> assistance on this, hugely appreciated.  You've become the "go to" man 
> for utf-8 bug solving ;-)  I'll do a local change to the copy of osmosis 
> on dev to make it write in Cp1252, I'll do a proper fix to make it 
> optional on the command line over the next few days.
>
> TomH, do you know if MySQL uses cp1252 exactly or are there some subtle 
> differences we should be aware of?

When I was trying to unmangle the GPX descriptions from a backup
that had been done through a UTF-8 connection I found that using
iconv to convert from CP1252 didn't seem to work for some reason
and I wound up writing a small program.

That program is attached - note that there are few characters it
doesn't handle as they never came up in the GPX descriptions.

The program basically takes a bogus UTF-8 stream from the database
and converts it to a valid one. It first unpacks each UTF-8 sequence
from the input to get an apparent Unicode character then works out
input character would have produced that Unicode character when the
data was inserted hence getting back to the original byte stream
that ruby added (which is really UTF-8 although MySQL didn't know
that at the time).

Hope that makes sense...

Tom

-- 
Tom Hughes (tom at compton.nu)
http://www.compton.nu/

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: unmangle.c
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20071108/32396d51/attachment.c>


More information about the dev mailing list