<p dir="ltr">Ah, great I'll take a look this evening and see if I can add a runtime check.</p>
<br><div class="gmail_quote"><div dir="ltr">On Mon, Nov 2, 2015, 5:15 PM Jochen Topf <<a href="mailto:jochen@remote.org">jochen@remote.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">As I mentioned below:<br>
> > > > You can test whether this bug is on your system, too: Download the XML<br>
> > > > for this node: <a href="http://www.openstreetmap.org/node/3382756758" rel="noreferrer" target="_blank">http://www.openstreetmap.org/node/3382756758</a>. Then run<br>
> > > > it through osmosis:<br>
> > > ><br>
> > > > osmosis --rx 3382756758.osm --wx out.osm<br>
> > > > Compare the two files, you'll see the musical notation character<br>
> > doubling<br>
<br>
On Mo, Nov 02, 2015 at 05:34:14 +0000, Brett Henderson wrote:<br>
> Sorry, I'm terrible at checking this list. 6 months isn't ideal. Does<br>
> anybody have an XML snippet that I could use for such a test?<br>
><br>
> On Fri, 6 Mar 2015 at 23:50 Jochen Topf <<a href="mailto:jochen@remote.org" target="_blank">jochen@remote.org</a>> wrote:<br>
><br>
> > I think the bug is important and subtle enough that we should make sure it<br>
> > doesn't resurface again. Either by detecting the runtime or by the check<br>
> > you describe. At least we should put the check into a unit test, so that<br>
> > people who run the tests on their platform after building can be safe.<br>
> ><br>
> > Jochen<br>
> ><br>
> > On Thu, Mar 05, 2015 at 05:25:12PM +1100, Brett Henderson wrote:<br>
> > > Date: Thu, 5 Mar 2015 17:25:12 +1100<br>
> > > From: Brett Henderson <<a href="mailto:brett@bretth.com" target="_blank">brett@bretth.com</a>><br>
> > > To: Jochen Topf <<a href="mailto:jochen@remote.org" target="_blank">jochen@remote.org</a>><br>
> > > Cc: OSM-Dev Openstreetmap <<a href="mailto:dev@openstreetmap.org" target="_blank">dev@openstreetmap.org</a>><br>
> > > Subject: Re: [OSM-dev] Osmosis data corruption on Debian Jessie/Testing<br>
> > ><br>
> > > I suspect that attempting to detect the underlying XML runtime would be<br>
> > > brittle. Another option might be to embed that bit of data in Osmosis<br>
> > > itself and do a self test before attempting to execute any XML tasks.<br>
> > ><br>
> > > I'm surprised that this is still an issue in standard Java. I tried<br>
> > > raising tickets against Sun Java before it moved under Oracle but never<br>
> > got<br>
> > > a response. I gave up, embedded Xerces in the main Osmosis distribution,<br>
> > > and then forgot about it.<br>
> > ><br>
> > > On 5 March 2015 at 10:16, Jochen Topf <<a href="mailto:jochen@remote.org" target="_blank">jochen@remote.org</a>> wrote:<br>
> > ><br>
> > > > Hi!<br>
> > > ><br>
> > > > Just spent a few hours debugging this problem: The way Osmosis is<br>
> > packaged<br>
> > > > on Debian Jessie seems to be wrong. It doesn't use the Xerces XML<br>
> > parser<br>
> > > > but seems to fall back to Java default XML parser which mangles Unicode<br>
> > > > characters.<br>
> > > ><br>
> > > > This can lead to data corruption (and has for me today) when using<br>
> > Osmosis<br>
> > > > for planet updates etc.<br>
> > > ><br>
> > > > You can test whether this bug is on your system, too: Download the XML<br>
> > > > for this node: <a href="http://www.openstreetmap.org/node/3382756758" rel="noreferrer" target="_blank">http://www.openstreetmap.org/node/3382756758</a>. Then run<br>
> > > > it through osmosis:<br>
> > > ><br>
> > > > osmosis --rx 3382756758.osm --wx out.osm<br>
> > > ><br>
> > > > Compare the two files, you'll see the musical notation character<br>
> > doubling<br>
> > > > in the second case when your Osmosis is broken. The fix is simple: Add<br>
> > > > a line "load /usr/share/java/xercesImpl.jar" to<br>
> > /etc/osmosis/plexus.conf.<br>
> > > > As I understand this, it tells Java to load Xerces replacing the<br>
> > built-in<br>
> > > > XML parser.<br>
> > > ><br>
> > > > I have opened a bug with Debian.<br>
> > > ><br>
> > > > Arguably Osmosis should somehow detect when Xerces isn't found and<br>
> > return<br>
> > > > an<br>
> > > > error instead of using a different implemenation. But I don't know<br>
> > enough<br>
> > > > about<br>
> > > > Java to say whether thats possible.<br>
> > > ><br>
> > > > Jochen<br>
> > > > --<br>
> > > > Jochen Topf <a href="mailto:jochen@remote.org" target="_blank">jochen@remote.org</a> <a href="http://www.jochentopf.com/" rel="noreferrer" target="_blank">http://www.jochentopf.com/</a><br>
> > > > +49-173-7019282<br>
> > > ><br>
> > > > _______________________________________________<br>
> > > > dev mailing list<br>
> > > > <a href="mailto:dev@openstreetmap.org" target="_blank">dev@openstreetmap.org</a><br>
> > > > <a href="https://lists.openstreetmap.org/listinfo/dev" rel="noreferrer" target="_blank">https://lists.openstreetmap.org/listinfo/dev</a><br>
> > > ><br>
> ><br>
> > --<br>
> > Jochen Topf <a href="mailto:jochen@remote.org" target="_blank">jochen@remote.org</a> <a href="http://www.jochentopf.com/" rel="noreferrer" target="_blank">http://www.jochentopf.com/</a><br>
> > +49-173-7019282<br>
> ><br>
<br>
--<br>
Jochen Topf <a href="mailto:jochen@remote.org" target="_blank">jochen@remote.org</a> <a href="http://www.jochentopf.com/" rel="noreferrer" target="_blank">http://www.jochentopf.com/</a> +49-351-31778688<br>
</blockquote></div>