[OSM-dev] Odd data in daily diffs (potlatch related?)

Jon Burgess jburgess777 at googlemail.com
Sat Mar 29 13:20:03 GMT 2008


On Sat, 2008-03-29 at 12:41 +0100, Frederik Ramm wrote:
> Hi,
> 
> > In the file daily-20080326-20080327.osc.bz2 there is this relation:
> > 
> >     <relation id="8571" timestamp="2008-03-26T22:05:03Z" user="wiesel111">
> >       <tag k="ESCESC" v=""/>
> >       <tag k="created_by" v="Potlatch 0.8"/>
> >       <tag k="type" v=""/>
> >     </relation>
> > 
> > Those are real escapes "\x1d". Fetching via the API doesn't have them,
> > the osmosis XML parser is barfing on them. Looks like some mismatch
> > between the output and input of osmosis here.
> 
> Seems to be two problems in one, first: how did the key get in there
> in the first place, second: why does it not get exported in a way that
> Osmosis can read.
> 
> I was hoping to fix the diff by simply running "recode" on it and
> instructing it to ignore invalid characters, however I was surprised
> to see that recode converted the file from UTF8 ut UTF16 without
> complaint (and back again to give an identical file). - Would running
> one of the many existing "UTF8 sanitizers" have resolved the problem?

Character 27 is valid UTF-8, but is not valid as content within an XML
document: http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char

More details and some Java code which might be useful for Osmosis:
http://cse-mjmcl.cse.bris.ac.uk/blog/2007/02/14/1171465494443.html


I dumped the same data myself with the planet dump tools and it produces
the same invalid output. I have added a line into the planet dump code
to replace this with a ?. 

Now that I have found the links above I should perhaps add an even
stricter test to drop everything < 32 except for 9, 10 & 13.

	Jon






More information about the dev mailing list