[OSM-dev] Escaping special characters when writing tags in OSM files with osm-subset.pl & write.pm

Fri Nov 10 07:12:18 GMT 2006

...

> I found and fixed the another UTF8 issue with these scripts which
> happened because the file handle was not set to utf8 mode. They now
> happily extract large amounts of planet.osm without seeing any UTF8
> issues (provided the input file is valid UTF8).
>
> These changes have been added to SVN.

This was for writing; can we do the same for reading? 
Because this could mean we don't need to call UTF8Sanitize any more before 
reading; which would save 300MB/planet.osm on my hdd :-)

> I've not exposed tag2osm from Writer.pm. I'm thinking that the better
> long term answer is for the writer code to be enhanced to support
> on-the-fly data output as is done by osm-subset.pl then all the XML
> writing can be moved over into Writer.pm. This could be used to reduce
> the memory usage of my simplify.pl code too.

If I understand you correctly; you want to split the writer into something 
like write_header,write_node,write_segment,write_way,write_end?

-
joerg