[OSM-dev] control characters in planet.osm

Brett Henderson brett at bretth.com
Fri Jul 13 00:39:56 BST 2007


Brett Henderson wrote:
> It's possible that the parser is wrong, but I find it hard to believe.  
> I'm using the standard SAX parser in java (created by 
> javax.xml.parsers.SAXParserFactory).  I'd have thought these types of 
> wrinkles would have been ironed out by now.
>
> It's complaining with the following message:
> An invalid XML character (Unicode: 0x13) was found in the value of 
> attribute "v" and element is "tag".
>
> Is it possible that an acute accent character (0x13) is invalid for 
> Unicode UTF-8 and should be represented some other way?  Anyway, I'll 
> have to do some more digging and see what I can find.
This page shows a number of invalid xml scenarios that the Xerces parser 
is supposed to identify.  One of those is the use of a 0x13 character 
(search for 0x13 in the page).
http://xmlconf.sourceforge.net/xml/reports/report-xerces-cnv.html

It appears as if a 0x13 should not be appearing in the planet file and 
is a bug.  Am I interpreting this correctly?






More information about the dev mailing list