[OSM-dev] Problems parsing planet.osm with Perl XML::Parser

Dave osm at randomjunk.co.uk
Wed Nov 1 16:52:52 GMT 2006


Thought I'd seen something like this before...

http://lists.openstreetmap.org/pipermail/talk/2006-October/008341.html



Ralf Zimmermann wrote:
> I want to write some Perl scripts in order to filter OSM data. As a first attempt, I wrote the file osm_stats.pl, which only counts the amount of nodes, segments and ways.
>
> With a lot of OSM files, the script works just fine. But when I throw the planet file planet-061023.osm on this script, I get the following error message:
>
> not well-formed (invalid token) at line 587103, column 37, byte 45215417 at /usr/lib/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/XML/Parser.pm line 187
>
> Looking at the planet file shows the following line as being problematic:
> 587102:   <node id="543408" lat="51.2714" lon="7.13737" timestamp="2006-02-16T16:43:38+00:00">
> 587103:     <tag k="name" v="абвгдежзиклмнопÑÑ?ÑÑÑÑÑÑÑÑÑÑÑÑ?ÑÑ?Ð?ÐÐÐÐÐÐÐÐÐÐÐ?ÐÐРСТУФХЦЧШЩЬЫЪЭЮЯ" />
> 587104:     <tag k="class" v="node" />
> 587105:   </node>
>
> I eliminated this node from the planet file and I get other lines that have the same issue, for example:
> 1729956:     <tag k="name" v="Handelshøjskole Syd" />
>
> Somehow, the parser does not like the special characters in the name tag. Whereas the first example seems somewhat misformed, the second example looks ok to me.
> To me it seems like the parser has a problem. But how can I solve that?
>
> Has anyone here used XML::Parser and experienced similar issues with special characters?
>
> Ralf
>
>
> --- osm_stats.pl -----------------------------
> #!/usr/bin/perl -w
>
> use strict;
>
> use XML::Parser;
> my $num_nodes = 0;
> my $num_segments = 0;
> my $num_ways = 0;
> my $p = new XML::Parser(Style => 'Subs');
> $p->parsefile($ARGV[0], ProtocolEncoding => 'UTF-8');
> print "Statistics of file $ARGV[0]:\n";
> print "Nodes:    $num_nodes\n";
> print "Segments: $num_segments\n";
> print "Ways:     $num_ways\n";
>
> sub node {
>    $num_nodes++;
> }
> sub segment {
>    $num_segments++;
> }
> sub way {
>    $num_ways++;
> }
>
>
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>   




More information about the dev mailing list