[OSM-dev] Problems parsing planet.osm with Perl XML::Parser
Joerg Ostertag (OSM Munich/Germany)
openstreetmap at ostertag.name
Wed Nov 1 20:29:26 GMT 2006
On Wednesday 01 November 2006 16:19, Ralf Zimmermann wrote:
> I want to write some Perl scripts in order to filter OSM data. As a first
> attempt, I wrote the file osm_stats.pl, which only counts the amount of
> nodes, segments and ways.
>
> With a lot of OSM files, the script works just fine. But when I throw the
> planet file planet-061023.osm on this script, I get the following error
> message:
>
> not well-formed (invalid token) at line 587103, column 37, byte 45215417 at
> /usr/lib/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/XML/Parser.pm
> line 187
>
> Looking at the planet file shows the following line as being problematic:
> 587102: <node id="543408" lat="51.2714" lon="7.13737"
> timestamp="2006-02-16T16:43:38+00:00"> 587103: <tag k="name"
> v="абвгдежзиклмнопÑÑ?ÑÑÑÑÑÑÑÑÑÑÑÑ?ÑÑ?Ð?ÐÐÐÐÐÐÐÐÐÐÐ?ÐÐÐ
> СТУФХЦЧШЩЬЫЪÐЮЯ" /> 587104: <tag k="class" v="node" />
> 587105: </node>
Did you try to use the perl modules which are already in our SVN for this.
They filter UTF-8 before parsing.
The modules are used in osm2cvs.pl , planet-mirror, osm-pdf-atlas, ...
-
Joerg
More information about the dev
mailing list