[OSM-dev] Osmosis data corruption on Debian Jessie/Testing

Jochen Topf jochen at remote.org
Wed Mar 4 23:16:12 UTC 2015


Hi!

Just spent a few hours debugging this problem: The way Osmosis is packaged
on Debian Jessie seems to be wrong. It doesn't use the Xerces XML parser
but seems to fall back to Java default XML parser which mangles Unicode
characters.

This can lead to data corruption (and has for me today) when using Osmosis
for planet updates etc.

You can test whether this bug is on your system, too: Download the XML
for this node: http://www.openstreetmap.org/node/3382756758. Then run
it through osmosis:

    osmosis --rx 3382756758.osm --wx out.osm

Compare the two files, you'll see the musical notation character doubling
in the second case when your Osmosis is broken. The fix is simple: Add
a line "load /usr/share/java/xercesImpl.jar" to /etc/osmosis/plexus.conf.
As I understand this, it tells Java to load Xerces replacing the built-in
XML parser.

I have opened a bug with Debian.

Arguably Osmosis should somehow detect when Xerces isn't found and return an
error instead of using a different implemenation. But I don't know enough about
Java to say whether thats possible.

Jochen
-- 
Jochen Topf  jochen at remote.org  http://www.jochentopf.com/  +49-173-7019282



More information about the dev mailing list