[OSM-dev] Split osm line with perl

Ævar Arnfjörð Bjarmason avarab at gmail.com
Sun Nov 29 17:23:41 GMT 2009


On Sun, Nov 29, 2009 at 12:10, Simone Cortesi <simone at cortesi.com> wrote:
> On Sun, Nov 29, 2009 at 12:16, Maarten Deen <mdeen at xs4all.nl> wrote:
>> I've tried a few things, but I'm not fluent in perl. My problem at the moment is
>> that splitting a line on the space character seems logical, but you run into
>> problems if a value has a space in it.
>
> wouldnt be wiser to use a DOM/XML parser. which is native able to interpret XML?

Yes it would. Unfortunately some Perl programmers seem to be unaware
of the existence of CPAN and insist on solving non-trivial problems
like XML parsing over and over again with the wrong tools, namely
regular expressions;

If you want a Perl one-liner to get all <tag> values from a OSM file
here's one on the house that isn't insane:

    perl -CI -MXML::Parser -E 'my $x = XML::Parser->new(Handlers => {
Start => sub { my ($p, $e, %kv) = @_; return unless $e eq "tag"; say
"$kv{k} = $kv{v}" } }); $x->parse(*STDIN)' < File.osm

This could probably done in an easier way using something higher level
than XML::Parser (which is just a raw interface to expat) but I'm not
that familiar with Perl XML parsing. If I were to acquaint myself with
it I'd be sure not to start by writing the millionth buggy tagsoup
parser using regexes though.




More information about the dev mailing list