[OSM-dev] Anyway to split or simplify an .osm file from the command line?
Frederik Ramm
frederik at remote.org
Mon Sep 21 01:02:18 BST 2009
Apollinaris Schoell wrote:
> Frederik,
>
> full agreement to all this arguments. but one of the problems is none of
> this is documented. the import page in the wiki is old has no best
> practice like you recommend here. and we have no tools which makes it
> easier to deal with such data.
> If others agree the wiki should be updated.
>
> We will need also more cleanup tools if such data is already uploaded.
> You mentioned in the other thread that you have used Postgis to manage
> such a task. Can you share that tools, scripts, commands?
The problem with that is that every situation will require slightly
different handling, and I have never bothered to create a more general
approach, but I'll share what I have. (Note that somebody just wrote
about multipolygon support in polyshp2osm which may be a more convenient
option!)
I recently did an import of the quarter boundaries of the city of
Dresden which were given to us as a shapefile with polygons.
Step 1: Load shape in PostGIS; say the table is called "dresden" and say
that there is an attribute named "sicad_txt" that has some kind of
numeric identifier for the quarter.
Step 2: Create a new table that contains all the border lines between
two polygons.
CREATE TABLE multilinestrings (
id SERIAL NOT NULL UNIQUE,
gid1 INTEGER NOT NULL,
gid2 INTEGER NOT NULL
);
SELECT AddGeometryColumn('multilinestrings', 'geom', 4326,
'MULTILINESTRING', 2);
INSERT INTO multilinestrings (gid1, gid2, geom) SELECT
int4(a.sicad_txt), int4(b.sicad_txt),
ST_Multi(ST_Intersection(a.the_geom, b.the_geom)) FROM dresden a,
dresden b WHERE a.sicad_txt<b.sicad_txt AND ST_Touches(a.the_geom,
b.the_geom);
Now because this only creates lines wherever two polygons touch, it has
not created lines for the outer border of the outermost ring of city
quarters (where no other quarter touches), so I did somewhat un-elegantly:
create table outline (id serial not null unique);
SELECT AddGeometryColumn('outline','geom',4326,'MULTILINESTRING',2);
insert into outline (id,geom) values(0, (select
ST_Boundary(ST_Multi(ST_Union(the_geom))) as geom from dresden));
insert into multilinestrings(gid1,gid2,geom) select
int4(a.sicad_txt),0,ST_Multi(ST_Intersection(a.the_geom, b.geom)) FROM
dresden a, outline b where ST_Touches(a.the_geom,b.geom);
Because we have "multilinestrings" which are collections of lines, and
what we really want is individual lines, we need to split them like so:
CREATE TABLE ways (
id serial not null,
gid1 INTEGER NOT NULL,
gid2 INTEGER NOT NULL
);
SELECT AddGeometryColumn('ways', 'geom', 4326, 'LINESTRING', 2);
INSERT INTO ways (geom, gid1, gid2) SELECT
(ST_Dump(ST_LineMerge(geom))).geom, gid1, gid2 FROM multilinestrings;
This results in a table "ways" which has individual borders between two
regions (quarters in this case), and each border has a "gid1" and "gid2"
attribute which has the numeric ids of the polygons on each side of it
(or 0 if it is an outer border).
Step 3, dump this table as a shape file and continue with a perl script
that I put together from pieces of shp2osm.pl:
use Geo::ShapeFile;
use HTML::Entities qw(encode_entities_numeric);
use strict;
my $source="whatever your source is";
open(N, ">nodes.osm") or die;
open(W, ">ways.osm") or die;
open(R, ">relations.osm") or die;
my $globali=-1;
my $areadata = {};
my $nodestorage = {};
my $way_output = {};
my $shpf = Geo::ShapeFile->new("ways");
process_shape($shpf);
close(N);
open(N, "names.txt") or die;
while(<N>)
{
chomp;
my ($id, $name) = split(";");
my $ways = $areadata->{$id};
printf "read area data %d \n", $id;
printf "num ways: %s\n", scalar(@$ways);
my $rings = [];
while (scalar @$ways)
{
my $ring = [ shift @$ways ];
my $first = $ring->[0]->{firstnode};
my $last = $ring->[0]->{lastnode};
my $minlat = $ring->[0]->{minlat};
my $maxlat = $ring->[0]->{maxlat};
printf "init ring with way %d\n", $ring->[0]->{id};
my $loop = 1;
my $found;
RING:
while($first != $last)
{
printf "current ring: %d items, %d to %d\n",
scalar(@$ring), $first, $last;
for(my $i=0; $i<scalar @$ways; $i++)
{
my $way = $ways->[$i];
#printf " way %d: %d to %d\n", $way->{id},
$way->{firstnode}, $way->{lastnode};
if ($way->{firstnode} == $last or $way->{lastnode} ==
$last)
{
#print " insert back\n";
push(@$ring, $way);
$last = ($way->{lastnode} == $last) ?
$way->{firstnode} : $way->{lastnode};
$minlat = $way->{minlat} if ($way->{minlat} <
$minlat);
$maxlat = $way->{maxlat} if ($way->{maxlat} >
$maxlat);
splice @$ways, $i, 1;
next RING;
}
elsif ($way->{lastnode} == $first or $way->{firstnode}
== $first)
{
#print " insert front\n";
unshift(@$ring, $way);
$first= ($way->{firstnode} == $first) ?
$way->{lastnode} : $way->{firstnode};
$minlat = $way->{minlat} if ($way->{minlat} <
$minlat);
$maxlat = $way->{maxlat} if ($way->{maxlat} >
$maxlat);
splice @$ways, $i, 1;
next RING;
}
else
{
#print " no insert\n";
}
}
printf "problem: first=$first last=$last and nothing to
insert\n";
last RING;
}
printf ("store ring; ways left: %d\n", scalar @$ways);
push(@$rings, { ring => $ring, minlat => $minlat, maxlat =>
$maxlat});
}
my $outer;
foreach my $ring(@$rings)
{
if (!defined($outer) or ($outer->{minlat} > $ring->{minlat}))
{
$outer=$ring;
}
}
printf R "<relation id='%d'>\n", $globali--;
foreach (@{$outer->{ring}})
{
printf R " <member type='way' role='outer' ref='%d' />\n",
$_->{id};
}
printf R " <tag k='name' v='%s' />\n", $name;
printf R " <tag k='source' v='%s' />\n", $source;
printf R " <tag k='boundary' v='administrative' />\n";
printf R " <tag k='admin_level' v='9' />\n";
printf R " <tag k='type' v='multipolygon' />\n";
printf R "</relation>\n";
}
close(R);
foreach (keys %$way_output)
{
my $w=$way_output->{$_};
print W $w;
}
close W;
sub tags_out {
my ($tags) = @_;
my %tags = $tags ? %$tags : ();
delete $tags{'_deleted'} unless $tags{'_deleted'};
while ( my ( $k, $v ) = each %tags ) {
my $key = encode_entities_numeric($k);
my $val = encode_entities_numeric($v);
print ' <tag k="'. $key .'" v="'. $val ."\"/>\n" if $val;
}
}
sub node_out {
my ($lon, $lat, $tags) = @_;
my $id = $nodestorage->{"$lat|$lon"};
if (!defined($id))
{
$id = $globali--;
print N " <node id='$id' lat='$lat' lon='$lon' />\n";
$nodestorage->{"$lat|$lon"} = $id;
}
$id;
}
sub way_out {
my ($nodes, $tags) = @_;
my $id = $globali--;
my $o;
my $j=0;
my $wayset=[];
while(scalar(@$nodes)>1)
{
$j++;
$o=sprintf(" <way id='%s%02d'>\n", $id, $j);
my $z;
for ($z=0; $z<scalar(@$nodes) && $z<1000; $z++) {
$o.=" <nd ref='$nodes->[$z]' />\n";
}
$o.=" <tag k='boundary' v='administrative' />\n";
# this is hardcoded for each individual import
if ($tags->{GID1}==0||$tags->{GID2}==0)
{
$o.=" <tag k='admin_level' v='6' />\n";
} else {
$o.=" <tag k='admin_level' v='9' />\n";
}
$o.=sprintf " <tag k='source' v='%s' />\n", $source;
$o.=" </way>\n";
my $w = { firstnode => $nodes->[0], lastnode => $nodes->[$z-1],
id => sprintf("%s%02d", $id, $j) };
push(@$wayset, $w);
$way_output->{sprintf("%s%02d", $id,$j)}=$o;
splice(@$nodes,0,$z-1);
}
$areadata->{$tags->{"GID1"}} = [] unless
defined($areadata->{$tags->{"GID1"}} );
push(@{$areadata->{$tags->{"GID1"}}}, @$wayset);
$areadata->{$tags->{"GID2"}} = [] unless
defined($areadata->{$tags->{"GID2"}} );
push(@{$areadata->{$tags->{"GID2"}}}, @$wayset);
}
sub process_shape {
my ($shpf) = @_;
for (1 .. $shpf->shapes()) {
my $shp = $shpf->get_shp_record($_);
my %dbf = $shpf->get_dbf_record($_);
for (1 .. $shp->num_parts) {
my $nodes;
for my $pt(map([ $_->X(), $_->Y()], $shp->get_part($_))) {
push @$nodes, node_out @$pt;
}
way_out($nodes, \%dbf);
}
}
}
This script also needs a file "names.txt" that contains the numeric IDs
of the regions and assigned names, separated by semicolons. Needless to
say this does not have any error handling and if it doesn't do what you
expect it to do then you need to find out why and fix it ;-) but it
worked quite well for me.
I'm afraid this is not good enough for people who want a manual to work
with but maybe it is a start which others can use to create such a
manual. But as I said, maybe this polyshp2osm thing already does all of
this in a sensible way, I haven't had the time to check it out yet.
Bye
Frederik
More information about the dev
mailing list