[OSM-dev] UTF-8 problem with API 0.5 planet.osm and osmosis 0.16
brett at bretth.com
Sat Sep 22 01:56:28 BST 2007
Karl Newman wrote:
> I'm trying to use osmosis 0.16 to slice out a 1-degree square section
> of the 0.5 planet.osm dump (specifically the 070905 file listed on the
> Wiki page), but I'm getting a UTF-8 conversion error. Here's the
> command line I'm using:
> osmosis --read-xml-0.5 file=planet-api05-070905.osm --bounding-box-0.5
> left=-123 right=-122 top=46 bottom=45 --write-xml-0.5 file=dump.osm
> Here's the exception stack trace:
> Exception in thread "Thread-1-read-xml-0.5 "
> com.bretth.osmosis.core.OsmosisRuntimeException: Unable to read XML file.
> at java.lang.Thread.run(Unknown Source)
> Caused by:
> Invalid byte 2 of 3-byte UTF-8 sequence.
> com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source)
> ... etc.
> I'm using osmosis on Windows and have written my own batch file
> wrapper which mostly duplicates the shell script functions. I have
> successfully used osmosis to read from the 0.5 api on
> openstreetmap.gryph.de <http://openstreetmap.gryph.de>, so I think
> osmosis is working correctly.
> Could it be a line-endings problem? Is there a known issue about
> UTF-8? As you can see, unfortunately the exception gives no line
> number in the source document, so it's impossible to nail it down.
> However, the exception happens almost immediately, so it must be
> occurring early in the file. I didn't see anything strange peeking at
> it with head.
> Thanks for your time.
> Karl Newman
I agree, the lack of line information is irritating. It should be
possible to get the java SAX parser to tell you a line number but
haven't got it working yet. I'll check it out the 0.5 planet now. It
should be possible to diagnose the problem line by running a direct
osmosis copy from one file to the other. In other words, remove the
bounding box task. The problem planet entry will be the one after the
last written entry.
More information about the dev