[OSM-dev] Problems parsing planet.osm with Perl XML::Parser
Joerg Ostertag (OSM Munich/Germany)
openstreetmap at ostertag.name
Wed Nov 1 20:29:26 GMT 2006
On Wednesday 01 November 2006 16:19, Ralf Zimmermann wrote:
> I want to write some Perl scripts in order to filter OSM data. As a first
> attempt, I wrote the file osm_stats.pl, which only counts the amount of
> nodes, segments and ways.
> With a lot of OSM files, the script works just fine. But when I throw the
> planet file planet-061023.osm on this script, I get the following error
> not well-formed (invalid token) at line 587103, column 37, byte 45215417 at
> line 187
> Looking at the planet file shows the following line as being problematic:
> 587102: <node id="543408" lat="51.2714" lon="7.13737"
> timestamp="2006-02-16T16:43:38+00:00"> 587103: <tag k="name"
> Ð¡Ð¢Ð£Ð¤Ð¥Ð¦Ð§Ð¨Ð©Ð¬Ð«ÐªÐÐ®Ð¯" /> 587104: <tag k="class" v="node" />
> 587105: </node>
Did you try to use the perl modules which are already in our SVN for this.
They filter UTF-8 before parsing.
The modules are used in osm2cvs.pl , planet-mirror, osm-pdf-atlas, ...
More information about the dev