[OSM-dev] Osmosis data corruption on Debian Jessie/Testing

Brett Henderson brett at bretth.com
Mon Nov 2 10:56:51 UTC 2015


I've added the check.  Processing will abort if unicode support is broken,
and the error message suggests including a newer version of Xerces.

On Mon, 2 Nov 2015 at 17:30 Brett Henderson <brett at bretth.com> wrote:

> Ah, great I'll take a look this evening and see if I can add a runtime
> check.
>
> On Mon, Nov 2, 2015, 5:15 PM Jochen Topf <jochen at remote.org> wrote:
>
>> As I mentioned below:
>> > > > > You can test whether this bug is on your system, too: Download
>> the XML
>> > > > > for this node: http://www.openstreetmap.org/node/3382756758.
>> Then run
>> > > > > it through osmosis:
>> > > > >
>> > > > >     osmosis --rx 3382756758.osm --wx out.osm
>> > > > > Compare the two files, you'll see the musical notation character
>> > > doubling
>>
>> On Mo, Nov 02, 2015 at 05:34:14 +0000, Brett Henderson wrote:
>> > Sorry, I'm terrible at checking this list.  6 months isn't ideal.  Does
>> > anybody have an XML snippet that I could use for such a test?
>> >
>> > On Fri, 6 Mar 2015 at 23:50 Jochen Topf <jochen at remote.org> wrote:
>> >
>> > > I think the bug is important and subtle enough that we should make
>> sure it
>> > > doesn't resurface again. Either by detecting the runtime or by the
>> check
>> > > you describe. At least we should put the check into a unit test, so
>> that
>> > > people who run the tests on their platform after building can be safe.
>> > >
>> > > Jochen
>> > >
>> > > On Thu, Mar 05, 2015 at 05:25:12PM +1100, Brett Henderson wrote:
>> > > > Date: Thu, 5 Mar 2015 17:25:12 +1100
>> > > > From: Brett Henderson <brett at bretth.com>
>> > > > To: Jochen Topf <jochen at remote.org>
>> > > > Cc: OSM-Dev Openstreetmap <dev at openstreetmap.org>
>> > > > Subject: Re: [OSM-dev] Osmosis data corruption on Debian
>> Jessie/Testing
>> > > >
>> > > > I suspect that attempting to detect the underlying XML runtime
>> would be
>> > > > brittle.  Another option might be to embed that bit of data in
>> Osmosis
>> > > > itself and do a self test before attempting to execute any XML
>> tasks.
>> > > >
>> > > > I'm surprised that this is still an issue in standard Java.  I tried
>> > > > raising tickets against Sun Java before it moved under Oracle but
>> never
>> > > got
>> > > > a response.  I gave up, embedded Xerces in the main Osmosis
>> distribution,
>> > > > and then forgot about it.
>> > > >
>> > > > On 5 March 2015 at 10:16, Jochen Topf <jochen at remote.org> wrote:
>> > > >
>> > > > > Hi!
>> > > > >
>> > > > > Just spent a few hours debugging this problem: The way Osmosis is
>> > > packaged
>> > > > > on Debian Jessie seems to be wrong. It doesn't use the Xerces XML
>> > > parser
>> > > > > but seems to fall back to Java default XML parser which mangles
>> Unicode
>> > > > > characters.
>> > > > >
>> > > > > This can lead to data corruption (and has for me today) when using
>> > > Osmosis
>> > > > > for planet updates etc.
>> > > > >
>> > > > > You can test whether this bug is on your system, too: Download
>> the XML
>> > > > > for this node: http://www.openstreetmap.org/node/3382756758.
>> Then run
>> > > > > it through osmosis:
>> > > > >
>> > > > >     osmosis --rx 3382756758.osm --wx out.osm
>> > > > >
>> > > > > Compare the two files, you'll see the musical notation character
>> > > doubling
>> > > > > in the second case when your Osmosis is broken. The fix is
>> simple: Add
>> > > > > a line "load /usr/share/java/xercesImpl.jar" to
>> > > /etc/osmosis/plexus.conf.
>> > > > > As I understand this, it tells Java to load Xerces replacing the
>> > > built-in
>> > > > > XML parser.
>> > > > >
>> > > > > I have opened a bug with Debian.
>> > > > >
>> > > > > Arguably Osmosis should somehow detect when Xerces isn't found and
>> > > return
>> > > > > an
>> > > > > error instead of using a different implemenation. But I don't know
>> > > enough
>> > > > > about
>> > > > > Java to say whether thats possible.
>> > > > >
>> > > > > Jochen
>> > > > > --
>> > > > > Jochen Topf  jochen at remote.org  http://www.jochentopf.com/
>> > > > > +49-173-7019282
>> > > > >
>> > > > > _______________________________________________
>> > > > > dev mailing list
>> > > > > dev at openstreetmap.org
>> > > > > https://lists.openstreetmap.org/listinfo/dev
>> > > > >
>> > >
>> > > --
>> > > Jochen Topf  jochen at remote.org  http://www.jochentopf.com/
>> > > +49-173-7019282
>> > >
>>
>> --
>> Jochen Topf  jochen at remote.org  http://www.jochentopf.com/
>> +49-351-31778688
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20151102/25c185b8/attachment.html>


More information about the dev mailing list