[OSM-dev] Escaping special characters when writing tags in OSM files with osm-subset.pl & write.pm
Joerg Ostertag (OSM Munich/Germany)
openstreetmap at ostertag.name
Fri Nov 10 07:12:18 GMT 2006
...
> I found and fixed the another UTF8 issue with these scripts which
> happened because the file handle was not set to utf8 mode. They now
> happily extract large amounts of planet.osm without seeing any UTF8
> issues (provided the input file is valid UTF8).
>
> These changes have been added to SVN.
This was for writing; can we do the same for reading?
Because this could mean we don't need to call UTF8Sanitize any more before
reading; which would save 300MB/planet.osm on my hdd :-)
> I've not exposed tag2osm from Writer.pm. I'm thinking that the better
> long term answer is for the writer code to be enhanced to support
> on-the-fly data output as is done by osm-subset.pl then all the XML
> writing can be moved over into Writer.pm. This could be used to reduce
> the memory usage of my simplify.pl code too.
If I understand you correctly; you want to split the writer into something
like write_header,write_node,write_segment,write_way,write_end?
-
joerg
More information about the dev
mailing list