[OSM-dev] control characters in planet.osm

Frederik Ramm frederik at remote.org
Thu Jul 12 12:40:10 BST 2007


Hi,

    I'd probably never have noticed since I usually process the  
planet file with regular expressions instead of a proper XML parser.  
I used "osmosis" last week, and its XML parser refused to process way  
4845936 (highway=secondary, name=Queen ^Street) because the value for  
contained an un-escaped control character (hex 0x13, ASCII 19, here  
depicted as ^S). (It took some time to find out that the ^S was at  
the root of the problem, and it was Brett Henderson who found it.)

The problem is fixed in this week's planet file, however another ^S  
has appeared in way 4827686 (highway=motorway, ref=A30, nat_ref=^SA30).

That problem is also fixed in the database.

Q1: Both ways were created by Potlatch alpha. While having a control  
character in a tag value is not technically invalid, I do not think  
that these were inserted on purpose. Maybe there is something about  
the Potlatch UI that makes people erroneously insert these ^Ses?

Q2: Is it valid XML to have an un-escaped ^S somewhere in the  
attribute CDATA? If yes, then the XML parser used by osmosis should  
be repaired. If no, then the XML exporter writing the planet file  
should be repaired.

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00.09' E008°23.33'






More information about the dev mailing list