[OSM-dev] Escaping special characters when writing tags in OSM files with osm-subset.pl & write.pm

Jon Burgess jburgess at uklinux.net
Thu Nov 9 22:37:39 GMT 2006

On Mon, 2006-11-06 at 08:21 +0100, Joerg Ostertag (OSM Munich/Germany)
> On Sunday 05 November 2006 01:49, Jon Burgess wrote:
> > I've found that both osm-subset.pl and Geo::OSM::Write fail to escape
> > characters like " & ' in tags leading to problem when trying to parse
> > the OSM that they write.
> >
> > The attached patches made them work for me although I still seem to be
> > seeing some UTF-8 related issues (though UTF8sanitizer fixes these up).
> > Does anyone know if there is a better way to be generating valid XML?
> >
> > The patch to osm-subset.pl also fixes it to work with .bz2 compressed
> > planet.osm files.
> What about putting the code for escaping into a function inside the 
> utils/perl/Geo/ Tree and using it from there?
> And: can you please add your patch to SVN.

I found and fixed the another UTF8 issue with these scripts which
happened because the file handle was not set to utf8 mode. They now
happily extract large amounts of planet.osm without seeing any UTF8
issues (provided the input file is valid UTF8).

These changes have been added to SVN.

I've not exposed tag2osm from Writer.pm. I'm thinking that the better
long term answer is for the writer code to be enhanced to support
on-the-fly data output as is done by osm-subset.pl then all the XML
writing can be moved over into Writer.pm. This could be used to reduce
the memory usage of my simplify.pl code too.


More information about the dev mailing list