[OSM-dev] Anyway to split or simplify an .osm file from the command line?

Frederik Ramm frederik at remote.org
Mon Sep 21 01:02:18 BST 2009

Apollinaris Schoell wrote:
> Frederik,
> full agreement to all this arguments. but one of the problems is none of 
> this is documented. the import page in the wiki is old has no best 
> practice like you recommend here. and we have no tools which makes it 
> easier to deal with such data.
> If others agree the wiki should be updated.
> We will need also more cleanup tools if such data is already uploaded. 
> You mentioned in the other thread that you have used Postgis to manage 
> such a task. Can you share that tools, scripts, commands? 

The problem with that is that every situation will require slightly 
different handling, and I have never bothered to create a more general 
approach, but I'll share what I have. (Note that somebody just wrote 
about multipolygon support in polyshp2osm which may be a more convenient 

I recently did an import of the quarter boundaries of the city of 
Dresden which were given to us as a shapefile with polygons.

Step 1: Load shape in PostGIS; say the table is called "dresden" and say 
that there is an attribute named "sicad_txt" that has some kind of 
numeric identifier for the quarter.

Step 2: Create a new table that contains all the border lines between 
two polygons.

CREATE TABLE multilinestrings (

SELECT AddGeometryColumn('multilinestrings', 'geom', 4326, 

INSERT INTO multilinestrings (gid1, gid2, geom) SELECT 
int4(a.sicad_txt), int4(b.sicad_txt), 
ST_Multi(ST_Intersection(a.the_geom, b.the_geom)) FROM dresden a, 
dresden b WHERE a.sicad_txt<b.sicad_txt AND ST_Touches(a.the_geom, 

Now because this only creates lines wherever two polygons touch, it has 
not created lines for the outer border of the outermost ring of city 
quarters (where no other quarter touches), so I did somewhat un-elegantly:

create table outline (id serial not null unique);
SELECT AddGeometryColumn('outline','geom',4326,'MULTILINESTRING',2);

insert into outline (id,geom) values(0, (select 
ST_Boundary(ST_Multi(ST_Union(the_geom))) as geom from dresden));

insert into multilinestrings(gid1,gid2,geom) select 
int4(a.sicad_txt),0,ST_Multi(ST_Intersection(a.the_geom, b.geom)) FROM 
dresden a, outline b where ST_Touches(a.the_geom,b.geom);

Because we have "multilinestrings" which are collections of lines, and 
what we really want is individual lines, we need to split them like so:

  id serial not null,
SELECT AddGeometryColumn('ways', 'geom', 4326, 'LINESTRING', 2);

INSERT INTO ways (geom, gid1, gid2) SELECT 
(ST_Dump(ST_LineMerge(geom))).geom, gid1, gid2 FROM multilinestrings;

This results in a table "ways" which has individual borders between two 
regions (quarters in this case), and each border has a "gid1" and "gid2" 
attribute which has the numeric ids of the polygons on each side of it 
(or 0 if it is an outer border).

Step 3, dump this table as a shape file and continue with a perl script 
that I put together from pieces of shp2osm.pl:

use Geo::ShapeFile;
use HTML::Entities qw(encode_entities_numeric);
use strict;

my $source="whatever your source is";

open(N, ">nodes.osm") or die;
open(W, ">ways.osm") or die;
open(R, ">relations.osm") or die;

my $globali=-1;

my $areadata = {};
my $nodestorage = {};
my $way_output = {};
my $shpf = Geo::ShapeFile->new("ways");


open(N, "names.txt") or die;
     my ($id, $name) = split(";");
     my $ways = $areadata->{$id};
     printf "read area data %d \n", $id;
     printf "num ways: %s\n", scalar(@$ways);

     my $rings = [];
     while (scalar @$ways)
         my $ring = [ shift @$ways ];
         my $first = $ring->[0]->{firstnode};
         my $last = $ring->[0]->{lastnode};
         my $minlat = $ring->[0]->{minlat};
         my $maxlat = $ring->[0]->{maxlat};
         printf "init ring with way %d\n", $ring->[0]->{id};

         my $loop = 1;
         my $found;
         while($first != $last)
             printf "current ring: %d items, %d to %d\n", 
scalar(@$ring), $first, $last;
             for(my $i=0; $i<scalar @$ways; $i++)
                 my $way = $ways->[$i];
                 #printf "   way %d: %d to %d\n", $way->{id}, 
$way->{firstnode}, $way->{lastnode};
                 if ($way->{firstnode} == $last or $way->{lastnode} == 
                     #print "      insert back\n";
                     push(@$ring, $way);
                     $last = ($way->{lastnode} == $last) ? 
$way->{firstnode} : $way->{lastnode};
                     $minlat = $way->{minlat} if ($way->{minlat} < 
                     $maxlat = $way->{maxlat} if ($way->{maxlat} > 
                     splice @$ways, $i, 1;
                     next RING;
                 elsif ($way->{lastnode} == $first or $way->{firstnode} 
== $first)
                     #print "      insert front\n";
                     unshift(@$ring, $way);
                     $first= ($way->{firstnode} == $first) ? 
$way->{lastnode} : $way->{firstnode};
                     $minlat = $way->{minlat} if ($way->{minlat} < 
                     $maxlat = $way->{maxlat} if ($way->{maxlat} > 
                     splice @$ways, $i, 1;
                     next RING;
                     #print "      no insert\n";
             printf "problem: first=$first last=$last and nothing to 
             last RING;
         printf ("store ring; ways left: %d\n", scalar @$ways);
         push(@$rings, { ring => $ring, minlat => $minlat, maxlat => 

     my $outer;
     foreach my $ring(@$rings)
         if (!defined($outer) or ($outer->{minlat} > $ring->{minlat}))

     printf R "<relation id='%d'>\n", $globali--;
     foreach (@{$outer->{ring}})
         printf R "   <member type='way' role='outer' ref='%d' />\n", 

     printf R "   <tag k='name' v='%s' />\n", $name;
     printf R "   <tag k='source' v='%s' />\n", $source;
     printf R "   <tag k='boundary' v='administrative' />\n";
     printf R "   <tag k='admin_level' v='9' />\n";
     printf R "   <tag k='type' v='multipolygon' />\n";
     printf R "</relation>\n";
foreach (keys %$way_output)
     my $w=$way_output->{$_};
     print W $w;
close W;

sub tags_out {
     my ($tags) = @_;
     my %tags = $tags ? %$tags : ();
     delete $tags{'_deleted'} unless $tags{'_deleted'};
     while ( my ( $k, $v ) = each %tags ) {
         my $key = encode_entities_numeric($k);
         my $val = encode_entities_numeric($v);
         print '    <tag k="'. $key .'" v="'. $val ."\"/>\n" if $val;

sub node_out {
     my ($lon, $lat, $tags) = @_;
     my $id = $nodestorage->{"$lat|$lon"};
     if (!defined($id))
         $id = $globali--;
         print N "  <node id='$id' lat='$lat' lon='$lon' />\n";
         $nodestorage->{"$lat|$lon"} = $id;

sub way_out {
     my ($nodes, $tags) = @_;
     my $id = $globali--;

     my $o;
     my $j=0;
     my $wayset=[];
         $o=sprintf("  <way id='%s%02d'>\n", $id, $j);
         my $z;
         for ($z=0; $z<scalar(@$nodes) && $z<1000; $z++) {
             $o.="    <nd ref='$nodes->[$z]' />\n";
         $o.="    <tag k='boundary' v='administrative' />\n";
         # this is hardcoded for each individual import
         if ($tags->{GID1}==0||$tags->{GID2}==0)
             $o.="    <tag k='admin_level' v='6' />\n";
         } else {
             $o.="    <tag k='admin_level' v='9' />\n";
         $o.=sprintf "    <tag k='source' v='%s' />\n", $source;
         $o.="  </way>\n";
         my $w = { firstnode => $nodes->[0], lastnode => $nodes->[$z-1], 
id => sprintf("%s%02d", $id, $j) };
         push(@$wayset, $w);
         $way_output->{sprintf("%s%02d", $id,$j)}=$o;
     $areadata->{$tags->{"GID1"}} = [] unless 
defined($areadata->{$tags->{"GID1"}} );
     push(@{$areadata->{$tags->{"GID1"}}}, @$wayset);
     $areadata->{$tags->{"GID2"}} = [] unless 
defined($areadata->{$tags->{"GID2"}} );
     push(@{$areadata->{$tags->{"GID2"}}}, @$wayset);

sub process_shape {
     my ($shpf) = @_;
     for (1 .. $shpf->shapes()) {
         my $shp = $shpf->get_shp_record($_);
         my %dbf = $shpf->get_dbf_record($_);
         for (1 .. $shp->num_parts) {
             my $nodes;
             for my $pt(map([ $_->X(), $_->Y()], $shp->get_part($_))) {
                 push @$nodes, node_out @$pt;
             way_out($nodes, \%dbf);

This script also needs a file "names.txt" that contains the numeric IDs 
of the regions and assigned names, separated by semicolons. Needless to 
say this does not have any error handling and if it doesn't do what you 
expect it to do then you need to find out why and fix it ;-) but it 
worked quite well for me.

I'm afraid this is not good enough for people who want a manual to work 
with but maybe it is a start which others can use to create such a 
manual. But as I said, maybe this polyshp2osm thing already does all of 
this in a sensible way, I haven't had the time to check it out yet.


More information about the dev mailing list