[OSM-dev] planet-latest.osm.bz2: Invalid byte 2 of 4-byte UTF-8 sequence.

Wolfram Schneider wosch at freebsd.org
Sun Feb 8 22:03:05 UTC 2015


On 3 February 2015 at 00:14, Wolfram Schneider <wosch at freebsd.org> wrote:
> Hi,
>
> I downloaded http://planet.osm.org/planet/planet-latest.osm.bz2
> (alias http://planet.osm.org/planet/2015/planet-150126.osm.bz2)

good news, the latest planet from 2-Feb-2015
(http://planet.osm.org/planet/2015/planet-150202.osm.bz2) works fine.
No parse errors anymore.

-Wolfram


> If I try to parse the XML with osmosis, I get an UTF-8 parse error.
> Any ideas whats wrong here?
>
>
> $bzip2 -dc planet-latest.osm.bz2 | osmosis --read-xml /dev/stdin
> --write-pbf planet-latest.osm.pbf
>
>
> Feb 02, 2015 5:36:48 PM org.openstreetmap.osmosis.core.Osmosis run
> INFO: Osmosis Version 0.43.1
> [...]
> Feb 02, 2015 5:56:27 PM
> org.openstreetmap.osmosis.core.pipeline.common.ActiveTaskManager
> waitForCompletion
> SEVERE: Thread for task 1-read-xml failed
> org.openstreetmap.osmosis.core.OsmosisRuntimeException: Unable to
> parse xml file /dev/stdin.  publicId=(null), systemId=(null),
> lineNumber=522892716, columnNumber=102.
>         at org.openstreetmap.osmosis.xml.v0_6.XmlReader.run(XmlReader.java:116)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 522892716;
> columnNumber: 102; Invalid byte 2 of 4-byte UTF-8 sequence.
>         at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown
> Source)
>         at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
>         at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
>         at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
>         at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
>         at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>         at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
> Source)
>         at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
>         at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
>         at org.openstreetmap.osmosis.xml.v0_6.XmlReader.run(XmlReader.java:111)
>         ... 1 more
> Caused by: org.apache.xerces.impl.io.MalformedByteSequenceException:
> Invalid byte 2 of 4-byte UTF-8 sequence.
>         at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
>         at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
>         at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
>         at org.apache.xerces.impl.XMLEntityScanner.scanName(Unknown Source)
>         at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanAttribute(Unknown
> Source)
>         at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown
> Source)
>         ... 11 more
>
> I'm using Osmosis Version 0.43.1 on debian 7.8, and java version "1.7.0_75"

--
Wolfram Schneider <wosch at FreeBSD.org> http://wolfram.schneider.org
Planet.osm extracts: http://extract.bbbike.org
BBBike Map Compare: http://bbbike.org/mc



More information about the dev mailing list