[OSM-dev] UTF-8 problem with API 0.5 planet.osm and osmosis 0.16

Brett Henderson brett at bretth.com
Sat Sep 22 09:04:56 BST 2007

Ooh, one other thing.  You mentioned that you've written a batch file 
wrapper for windows.  Can I have it? :-)  The only reason there isn't 
one in there already is because I didn't know how to do it and hadn't 
gotten around to stealing one from elsewhere.

Karl Newman wrote:
> I'm trying to use osmosis 0.16 to slice out a 1-degree square section 
> of the 0.5 planet.osm dump (specifically the 070905 file listed on the 
> Wiki page), but I'm getting a UTF-8 conversion error. Here's the 
> command line I'm using:
> osmosis --read-xml-0.5 file=planet-api05-070905.osm --bounding-box-0.5 
> left=-123 right=-122 top=46 bottom=45 --write-xml-0.5 file=dump.osm
> Here's the exception stack trace:
> Exception in thread "Thread-1-read-xml-0.5 " 
> com.bretth.osmosis.core.OsmosisRuntimeException: Unable to read XML file.
>         at 
> com.bretth.osmosis.core.xml.v0_5.XmlReader.run(XmlReader.java:107)
>         at java.lang.Thread.run(Unknown Source)
> Caused by: 
> com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: 
> Invalid byte 2 of 3-byte UTF-8 sequence.
>         at 
> com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unknown 
> Source)
>         at 
> com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source)
>         at 
> com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown 
> Source)
> ... etc.
> I'm using osmosis on Windows and have written my own batch file 
> wrapper which mostly duplicates the shell script functions. I have 
> successfully used osmosis to read from the 0.5 api on 
> openstreetmap.gryph.de <http://openstreetmap.gryph.de>, so I think 
> osmosis is working correctly.
> Could it be a line-endings problem? Is there a known issue about 
> UTF-8? As you can see, unfortunately the exception gives no line 
> number in the source document, so it's impossible to nail it down. 
> However, the exception happens almost immediately, so it must be 
> occurring early in the file. I didn't see anything strange peeking at 
> it with head.
> Thanks for your time.
> Karl Newman

More information about the dev mailing list