[OSM-dev] control characters in planet.osm
Brett Henderson
brett at bretth.com
Fri Jul 13 00:39:56 BST 2007
Brett Henderson wrote:
> It's possible that the parser is wrong, but I find it hard to believe.
> I'm using the standard SAX parser in java (created by
> javax.xml.parsers.SAXParserFactory). I'd have thought these types of
> wrinkles would have been ironed out by now.
>
> It's complaining with the following message:
> An invalid XML character (Unicode: 0x13) was found in the value of
> attribute "v" and element is "tag".
>
> Is it possible that an acute accent character (0x13) is invalid for
> Unicode UTF-8 and should be represented some other way? Anyway, I'll
> have to do some more digging and see what I can find.
This page shows a number of invalid xml scenarios that the Xerces parser
is supposed to identify. One of those is the use of a 0x13 character
(search for 0x13 in the page).
http://xmlconf.sourceforge.net/xml/reports/report-xerces-cnv.html
It appears as if a 0x13 should not be appearing in the planet file and
is a bug. Am I interpreting this correctly?
More information about the dev
mailing list