[OSM-dev] control characters in planet.osm
Frederik Ramm
frederik at remote.org
Thu Jul 12 12:40:10 BST 2007
Hi,
I'd probably never have noticed since I usually process the
planet file with regular expressions instead of a proper XML parser.
I used "osmosis" last week, and its XML parser refused to process way
4845936 (highway=secondary, name=Queen ^Street) because the value for
contained an un-escaped control character (hex 0x13, ASCII 19, here
depicted as ^S). (It took some time to find out that the ^S was at
the root of the problem, and it was Brett Henderson who found it.)
The problem is fixed in this week's planet file, however another ^S
has appeared in way 4827686 (highway=motorway, ref=A30, nat_ref=^SA30).
That problem is also fixed in the database.
Q1: Both ways were created by Potlatch alpha. While having a control
character in a tag value is not technically invalid, I do not think
that these were inserted on purpose. Maybe there is something about
the Potlatch UI that makes people erroneously insert these ^Ses?
Q2: Is it valid XML to have an un-escaped ^S somewhere in the
attribute CDATA? If yes, then the XML parser used by osmosis should
be repaired. If no, then the XML exporter writing the planet file
should be repaired.
Bye
Frederik
--
Frederik Ramm ## eMail frederik at remote.org ## N49°00.09' E008°23.33'
More information about the dev
mailing list