[OSM-dev] control characters in planet.osm

Brett Henderson brett at bretth.com
Fri Jul 13 00:30:09 BST 2007


David Earl wrote:
> According to the XML spec:
>
> <quote>
> In the content of elements, character data is any string of characters 
> which does not contain the start-delimiter of any markup and does not 
> include the CDATA-section-close delimiter, "]]>". In a CDATA section, 
> character data is any string of characters not including the 
> CDATA-section-close delimiter, "]]>".
> </quote>
>
> The only escapes are for < & " and >
>
> So I think your parser is wrong.
>
> David
>   
It's possible that the parser is wrong, but I find it hard to believe.  
I'm using the standard SAX parser in java (created by 
javax.xml.parsers.SAXParserFactory).  I'd have thought these types of 
wrinkles would have been ironed out by now.

It's complaining with the following message:
An invalid XML character (Unicode: 0x13) was found in the value of 
attribute "v" and element is "tag".

Is it possible that an acute accent character (0x13) is invalid for 
Unicode UTF-8 and should be represented some other way?  Anyway, I'll 
have to do some more digging and see what I can find.






More information about the dev mailing list