[OSM-dev] UTF-8 problem with API 0.5 planet.osm and osmosis 0.16

Brett Henderson brett at bretth.com
Sat Sep 22 08:46:16 BST 2007


Brett Henderson wrote: 
> It will take me a few hours to get back to this.  I've checked in a 
> new version of osmosis that supplies line number information when 
> parse errors occur but it's only in svn at the moment.
Okay, the line information didn't help because the UTF-8 exception 
results in an IOException, not a SAX parsing exception.  In other words 
the exception is thrown in a lower level piece of code unaware of line 
counts.

But I've tracked the problem down to node 78270.

The name tag in the 0.5 planet is: <tag k="name" v="Kronprinsesse 
M<E4>rthas all<E9>" />
The name tag from the api is: <tag k="name" v="Kronprinsesse 
M<C3><A4>rthas all<C3><A9>"/>

These values above are obtained using less which is not a great way to 
view the proper way to view these files but I don't have much else 
available on my laptop at the moment.  But I think it is enough to go on 
for now.

The 0.5 planet contains a byte with hex value "E4" which to the best of 
my knowledge is not valid UTF-8.  The api is return the "C3"-"A4" pair 
which *does* look like valid UTF-8.

It looks to me like the 0.5 planet is incorrect, the api is correct, and 
osmosis is correct.






More information about the dev mailing list