[OSM-dev] Problem with 2010/03/10 planet...

Jon Burgess jburgess777 at googlemail.com
Thu Mar 11 23:33:03 GMT 2010


On Thu, 2010-03-11 at 17:05 -0600, Jeffrey Ollie wrote:
> Getting the following traceback trying to extract some data from the
> March 10th planet file.  I'm using osmosis 0.34.
> 
> org.openstreetmap.osmosis.core.OsmosisRuntimeException: Unable to parse xml file
>  /dev/stdin.  publicId=(null), systemId=(null), lineNumber=529642199,
> columnNumber=27.
>         at org.openstreetmap.osmosis.core.xml.v0_6.XmlReader.run(XmlReader.java:113)
>         at java.lang.Thread.run(Thread.java:636)
> Caused by: org.xml.sax.SAXParseException: Character reference "&#24"
> is an invalid XML character.

It looks like this was caused by a change made by Frederick back in
r19176. The planet dump code used to turn all characters less than 32
into '?' instead of creating these character sequences. I guess he
didn't read the bit of the XML spec which says that all characters <32
are invalid except for tab / newline / carriage return[1]. It makes no
difference whether they exist as plain characters or character entities,
they are still not allowed, e.g.

$ echo "<test></test>" | xmllint - -noout
-:1: parser error : xmlParseCharRef: invalid xmlChar value 24
<test></test>

I have a committed a change which should resolve this for future dumps
in r20430 but someone needs to compile and update the copy on the
server.

	Jon


1: http://www.w3.org/TR/REC-xml/#NT-Char






More information about the dev mailing list