[OSM-dev] planet-latest.osm.bz2: Invalid byte 2 of 4-byte UTF-8 sequence.
Wolfram Schneider
wosch at freebsd.org
Mon Feb 2 23:14:50 UTC 2015
Hi,
I downloaded http://planet.osm.org/planet/planet-latest.osm.bz2
(alias http://planet.osm.org/planet/2015/planet-150126.osm.bz2)
If I try to parse the XML with osmosis, I get an UTF-8 parse error.
Any ideas whats wrong here?
$bzip2 -dc planet-latest.osm.bz2 | osmosis --read-xml /dev/stdin
--write-pbf planet-latest.osm.pbf
Feb 02, 2015 5:36:48 PM org.openstreetmap.osmosis.core.Osmosis run
INFO: Osmosis Version 0.43.1
[...]
Feb 02, 2015 5:56:27 PM
org.openstreetmap.osmosis.core.pipeline.common.ActiveTaskManager
waitForCompletion
SEVERE: Thread for task 1-read-xml failed
org.openstreetmap.osmosis.core.OsmosisRuntimeException: Unable to
parse xml file /dev/stdin. publicId=(null), systemId=(null),
lineNumber=522892716, columnNumber=102.
at org.openstreetmap.osmosis.xml.v0_6.XmlReader.run(XmlReader.java:116)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.xml.sax.SAXParseException; lineNumber: 522892716;
columnNumber: 102; Invalid byte 2 of 4-byte UTF-8 sequence.
at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown
Source)
at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
Source)
at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
at org.openstreetmap.osmosis.xml.v0_6.XmlReader.run(XmlReader.java:111)
... 1 more
Caused by: org.apache.xerces.impl.io.MalformedByteSequenceException:
Invalid byte 2 of 4-byte UTF-8 sequence.
at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.scanName(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanAttribute(Unknown
Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown
Source)
... 11 more
I'm using Osmosis Version 0.43.1 on debian 7.8, and java version "1.7.0_75"
--
Wolfram Schneider <wosch at FreeBSD.org> http://wolfram.schneider.org
Planet.osm extracts: http://extract.bbbike.org
BBBike Map Compare: http://bbbike.org/mc
More information about the dev
mailing list