[OSM-dev] Problems parsing planet.osm with Perl XML::Parser

Joerg Ostertag (OSM Munich/Germany) openstreetmap at ostertag.name
Wed Nov 1 20:29:26 GMT 2006


On Wednesday 01 November 2006 16:19, Ralf Zimmermann wrote:
> I want to write some Perl scripts in order to filter OSM data. As a first
> attempt, I wrote the file osm_stats.pl, which only counts the amount of
> nodes, segments and ways.
>
> With a lot of OSM files, the script works just fine. But when I throw the
> planet file planet-061023.osm on this script, I get the following error
> message:
>
> not well-formed (invalid token) at line 587103, column 37, byte 45215417 at
> /usr/lib/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/XML/Parser.pm
> line 187
>
> Looking at the planet file shows the following line as being problematic:
> 587102:   <node id="543408" lat="51.2714" lon="7.13737"
> timestamp="2006-02-16T16:43:38+00:00"> 587103:     <tag k="name"
> v="абвгдежзиклмнопÑÑ?ÑÑÑÑÑÑÑÑÑÑÑÑ?ÑÑ?Ð?ÐÐÐÐÐÐÐÐÐÐÐ?ÐÐÐ
> СТУФХЦЧШЩЬЫЪЭЮЯ" /> 587104:     <tag k="class" v="node" />
> 587105:   </node>

Did you try to use the perl modules which are already in our SVN for this. 
They filter UTF-8 before parsing.

The modules are used in osm2cvs.pl , planet-mirror, osm-pdf-atlas, ...

-
Joerg




More information about the dev mailing list