[OSM-dev] Odd data in daily diffs (potlatch related?)
Jon Burgess
jburgess777 at googlemail.com
Sat Mar 29 13:20:03 GMT 2008
On Sat, 2008-03-29 at 12:41 +0100, Frederik Ramm wrote:
> Hi,
>
> > In the file daily-20080326-20080327.osc.bz2 there is this relation:
> >
> > <relation id="8571" timestamp="2008-03-26T22:05:03Z" user="wiesel111">
> > <tag k="ESCESC" v=""/>
> > <tag k="created_by" v="Potlatch 0.8"/>
> > <tag k="type" v=""/>
> > </relation>
> >
> > Those are real escapes "\x1d". Fetching via the API doesn't have them,
> > the osmosis XML parser is barfing on them. Looks like some mismatch
> > between the output and input of osmosis here.
>
> Seems to be two problems in one, first: how did the key get in there
> in the first place, second: why does it not get exported in a way that
> Osmosis can read.
>
> I was hoping to fix the diff by simply running "recode" on it and
> instructing it to ignore invalid characters, however I was surprised
> to see that recode converted the file from UTF8 ut UTF16 without
> complaint (and back again to give an identical file). - Would running
> one of the many existing "UTF8 sanitizers" have resolved the problem?
Character 27 is valid UTF-8, but is not valid as content within an XML
document: http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char
More details and some Java code which might be useful for Osmosis:
http://cse-mjmcl.cse.bris.ac.uk/blog/2007/02/14/1171465494443.html
I dumped the same data myself with the planet dump tools and it produces
the same invalid output. I have added a line into the planet dump code
to replace this with a ?.
Now that I have found the links above I should perhaps add an even
stricter test to drop everything < 32 except for 9, 10 & 13.
Jon
More information about the dev
mailing list