[OSM-dev] planet-latest.osm.bz2: Invalid byte 2 of 4-byte UTF-8 sequence.

Wolfram Schneider wosch at freebsd.org
Mon Feb 2 23:14:50 UTC 2015


Hi,

I downloaded http://planet.osm.org/planet/planet-latest.osm.bz2
(alias http://planet.osm.org/planet/2015/planet-150126.osm.bz2)

If I try to parse the XML with osmosis, I get an UTF-8 parse error.
Any ideas whats wrong here?


$bzip2 -dc planet-latest.osm.bz2 | osmosis --read-xml /dev/stdin
--write-pbf planet-latest.osm.pbf


Feb 02, 2015 5:36:48 PM org.openstreetmap.osmosis.core.Osmosis run
INFO: Osmosis Version 0.43.1
[...]
Feb 02, 2015 5:56:27 PM
org.openstreetmap.osmosis.core.pipeline.common.ActiveTaskManager
waitForCompletion
SEVERE: Thread for task 1-read-xml failed
org.openstreetmap.osmosis.core.OsmosisRuntimeException: Unable to
parse xml file /dev/stdin.  publicId=(null), systemId=(null),
lineNumber=522892716, columnNumber=102.
        at org.openstreetmap.osmosis.xml.v0_6.XmlReader.run(XmlReader.java:116)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.xml.sax.SAXParseException; lineNumber: 522892716;
columnNumber: 102; Invalid byte 2 of 4-byte UTF-8 sequence.
        at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown
Source)
        at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
Source)
        at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
        at org.openstreetmap.osmosis.xml.v0_6.XmlReader.run(XmlReader.java:111)
        ... 1 more
Caused by: org.apache.xerces.impl.io.MalformedByteSequenceException:
Invalid byte 2 of 4-byte UTF-8 sequence.
        at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
        at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
        at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
        at org.apache.xerces.impl.XMLEntityScanner.scanName(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanAttribute(Unknown
Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown
Source)
        ... 11 more

I'm using Osmosis Version 0.43.1 on debian 7.8, and java version "1.7.0_75"

-- 
Wolfram Schneider <wosch at FreeBSD.org> http://wolfram.schneider.org
Planet.osm extracts: http://extract.bbbike.org
BBBike Map Compare: http://bbbike.org/mc



More information about the dev mailing list