[Routing] "Connected Nodes"
Frederik Ramm
frederik at remote.org
Tue Oct 9 11:59:29 BST 2007
Hi,
> Let the database do the work, don't transfer data to your script
> unless absolutely necessary.
> To find the 10 longest ways, do something like this (depending on
> your database schema):
As far as I understood it, the OP was not after the longest way, but
after the largest interconnected set of ways.
I offer the following idea (working with a 0.5 style database - I
executed it on my test database used for the migration):
First, count the number of ways using each node:
alter table way_nodes add key (node_id);
(this takes half an hour)
select group_concat(distinct id separator ",") from way_nodes
group by node_id into outfile "/tmp/foo.out";
(this takes 5 minutes but would likely take days without the previous
step)
Now we have a file that has one line for each node, containing a
comma-separated list of way ids that node belongs to. In most cases it
will only be one number (these are uninteresting).
A little perl script is run on the result to sort these ways into
contiguous groups. Luckily since we're only looking at ways, this
script needs RAM for about 5 million big integers stored in hashes.
Takes about 80 minutes and needs 1.5 GB of RAM:
#!/usr/bin/perl
use strict;
my $group = {}; # key: way id, value: group number
my $reverse = {}; # key: group number, value: array of way ids
my $groupcnt = 0;
while(<>)
{
chomp;
my @ways = split(/,/);
next if scalar(@ways) < 2;
# number below is wc -l of input file divided by 100
printf STDERR "\r%3d%%...", $./ 48202 if ($.%100==0);
my $matching = {};
foreach my $way(@ways)
{
$matching->{$group->{$way}} = 1 if defined($group->{$way});
}
my @matching = keys %$matching;
# 0 matching means none of the ways had been seen yet.
# 1 matching means the way(s) are in exactly 1 group.
# >1 matching means we have to join existing groups.
if (scalar(@matching) == 0)
{
$groupcnt++;
$group->{$_}=$groupcnt foreach @ways;
$reverse->{$groupcnt} = \@ways;
}
else
{
my $grp = shift @matching;
foreach (@ways)
{
unless(defined($group->{$_}))
{
$group->{$_} = $grp;
push(@{$reverse->{$grp}}, $_);
}
}
foreach my $othergrp(@matching)
{
$group->{$_} = $grp foreach @{$reverse->{$othergrp}};
push(@{$reverse->{$grp}}, @{$reverse->{$othergrp}});
delete $reverse->{$othergrp};
}
}
}
my $i = 1;
foreach my $group(sort { scalar(@{$reverse->{$b}}) <=> scalar(@
{$reverse->{$a}}) } keys %$reverse)
{
printf "%3d. %d contiguous ways\n", $i++, scalar(@{$reverse->
{$group}});
last if ($i == 10);
}
printf "total: %d groups\n\n\n", scalar(keys %$reverse);
$i = 1;
foreach my $group(sort { scalar(@{$reverse->{$b}}) <=> scalar(@
{$reverse->{$a}}) } keys %$reverse)
{
printf "%3d. %d:\n", $i++, scalar(@{$reverse->{$group}});
print join(",", @{$reverse->{$group}});
print "\n";
last if ($i == 1000);
}
The results:
1. 1759721 contiguous ways
2. 397027 contiguous ways
3. 175187 contiguous ways
4. 71034 contiguous ways
5. 50414 contiguous ways
6. 46719 contiguous ways
7. 37558 contiguous ways
8. 37279 contiguous ways
9. 35668 contiguous ways
total: 26651 groups
(The total is slightly skewed since ways that don't connect to
anything are not counted but who cares)
So, what is this mega group of 1.7 million ways? Let's randomly
request one:
<osm version="0.5" generator="OpenStreetMap server">
<way id="6948402" visible="true" timestamp="2007-09-18T16:28:18
+01:00" user="AND">
<nd ref="43434791"/>
<nd ref="43436406"/>
<tag k="name" v="De Duivenekker"/>
<tag k="AND_nosr_r" v="15553374"/>
<tag k="highway" v="unclassified"/>
</way>
</osm>
AND data plus everything in Europe connected from there, it seems.
And now one from the second group of 400k ways:
<way id="4487247" visible="true" timestamp="2007-05-29T18:34:30
+01:00" user="peter_james">
<nd ref="27505795"/>
<nd ref="27505796"/>
<nd ref="27505797"/>
<nd ref="27505798"/>
<nd ref="27505799"/>
<tag k="note" v="Bankhead Lane on sign at this end."/>
<tag k="maxweight" v="7.5"/>
<tag k="highway" v="unclassified"/>
<tag k="created_by" v="JOSM"/>
<tag k="maxspeed" v="48"/>
<tag k="name" v="Bank Head Lane"/>
</way>
<node id="27505795" lat="53.7262972" lon="-2.6349497"
user="peter_james" visible="true" timestamp="2007-08-09T17:22:53+01:00">
Firmly UK territory then.
The third largest group of 175k ways:
<way id="7181789" visible="true" timestamp="2007-09-19T08:05:21
+02:00" user="Osmosis System User">
<nd ref="54466656"/>
<nd ref="54425959"/>
<tag k="tiger:source" v="tiger_import_dch_v0.6_20070809"/>
<tag k="tiger:county" v="San Bernardino, CA"/>
<tag k="highway" v="service"/>
<tag k="tiger:tlid" v="144677267"/>
<tag k="tiger:reviewed" v="no"/>
<tag k="tiger:upload_uuid" v="bulk_upload.pl-793fb12a-827c-46ef-
ab80-4a7bdd9a1fdb"/>
<tag k="access" v="private"/>
<tag k="tiger:cfcc" v="A74"/>
</way>
California area, it seems.
I'd happily make available the detailed results but they are rather
worthless because due to automatic way creation during the 0.5
changeover, my way IDs are not necessarily the same as the one in
production. I might repeat the exercise with proper data after the
next planet dump.
Bye
Frederik
--
Frederik Ramm ## eMail frederik at remote.org ## N49°00.09' E008°23.33'
More information about the Routing
mailing list