[OSM-dev] UTF-8 problem with API 0.5 planet.osm and osmosis 0.16

Karl Newman siliconfiend at gmail.com
Fri Sep 21 19:37:03 BST 2007


I'm trying to use osmosis 0.16 to slice out a 1-degree square section of the
0.5 planet.osm dump (specifically the 070905 file listed on the Wiki page),
but I'm getting a UTF-8 conversion error. Here's the command line I'm using:
osmosis --read-xml-0.5 file=planet-api05-070905.osm
--bounding-box-0.5left=-123 right=-122 top=46 bottom=45 --
write-xml-0.5 file=dump.osm
Here's the exception stack trace:
Exception in thread "Thread-1-read-xml-0.5"
com.bretth.osmosis.core.OsmosisRuntimeException: Unable to read XML file.
        at com.bretth.osmosis.core.xml.v0_5.XmlReader.run(XmlReader.java
:107)
        at java.lang.Thread.run(Unknown Source)
Caused by:
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException:
Invalid byte 2 of 3-byte UTF-8 sequence.
        at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unknown
Source)
        at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown
Source)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown
Source)
... etc.
I'm using osmosis on Windows and have written my own batch file wrapper
which mostly duplicates the shell script functions. I have successfully used
osmosis to read from the 0.5 api on openstreetmap.gryph.de, so I think
osmosis is working correctly.

Could it be a line-endings problem? Is there a known issue about UTF-8? As
you can see, unfortunately the exception gives no line number in the source
document, so it's impossible to nail it down. However, the exception happens
almost immediately, so it must be occurring early in the file. I didn't see
anything strange peeking at it with head.

Thanks for your time.

Karl Newman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20070921/6c473b84/attachment.html>


More information about the dev mailing list