[OSM-dev] UTF-8 problem with API 0.5 planet.osm and osmosis 0.16
brett at bretth.com
Sat Sep 22 08:46:16 BST 2007
Brett Henderson wrote:
> It will take me a few hours to get back to this. I've checked in a
> new version of osmosis that supplies line number information when
> parse errors occur but it's only in svn at the moment.
Okay, the line information didn't help because the UTF-8 exception
results in an IOException, not a SAX parsing exception. In other words
the exception is thrown in a lower level piece of code unaware of line
But I've tracked the problem down to node 78270.
The name tag in the 0.5 planet is: <tag k="name" v="Kronprinsesse
M<E4>rthas all<E9>" />
The name tag from the api is: <tag k="name" v="Kronprinsesse
These values above are obtained using less which is not a great way to
view the proper way to view these files but I don't have much else
available on my laptop at the moment. But I think it is enough to go on
The 0.5 planet contains a byte with hex value "E4" which to the best of
my knowledge is not valid UTF-8. The api is return the "C3"-"A4" pair
which *does* look like valid UTF-8.
It looks to me like the 0.5 planet is incorrect, the api is correct, and
osmosis is correct.
More information about the dev