[Routing] [OSM-dev] pre-compute routing

Robert (Jamie) Munro rjmunro at arjam.net
Tue Oct 2 12:34:21 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jon Bright wrote:
> Hi,
> 
> Steve Coast wrote:
>> (please reply to the new routing list, sent to dev so that you know  
>> it's there)
>>
>> We have about 4 million ways.
> 
> ...but we don't need to route between them all.  The thing to route 
> between, I think, would be "fork" nodes - i.e. nodes where more than two 
> ways join.  I have a query running to find these (as the first stage to 
> producing more routing-optimised data).  At the start and end of the 
> route, assuming the starting/ending node isn't itself a fork node, you 
> end up with a choice of (at most) four routes (from the next fork node 
> in each direction on the start and end ways) and pick the 
> shortest/fastest best of those four.
> 
> My query isn't finished yet (I have ideas for a much faster version, but 
> I want this one to get finished first).  But I'm going to go out on a 
> limb and say that we have about a million fork nodes.
> 
>> An all pairs routing graph has therefore 16 million entries. We  
>> assume a route from a to b may be different than from b to a due to  
>> one way streets, turn restrictions etc.
> 
> I think your maths here is wrong (if I'm wrong, a lot of the rest of 
> this mail is going to be irrelevant).  If I had 4 nodes I was routing 
> between (and assuming I respect one-way restrictions), I'd have
> 
> 1->2, 1->3, 1->4, 2->1, 2->3, 2->4, 3->1, 3->2, 3->4, 4->1, 4->2, 4->3
> 
> = 12 routes.  Basically, if I have N nodes, I have N*(N-1) routes.  For 
> your figure of 4 million, I end up with 16 trillion routes.  This enters 
> the realm of the unrealistic storagewise (at least for OSM, probably not 
> for Google).  I'm not sure you need to be quite so pessimistic about the 
> one-way stuff, though - if a route's not affected by one-way stuff, you 
> could just store one of the directions and use some simple logic to 
> retrieve that for the other direction too if the direction you're 
> looking for doesn't have a route.
> 
> For 4 nodes, I then have
> 
> 1->2, 1->3, 1->4, 2->3, 2->4, 3->4
> 
> = 6 routes.  For N nodes, (N*(N-1))/2.  Assume 25% of routes have a 
> differing reverse direction, then we have ((N*(N-1))/2)*1.25 -> 
> 5(N*(N-1))/8.
> 
> Going with the million-fork-node assumption from above, all-pairs would 
> be 5*(1000000*999999)/8 ~= 625,000,000,000 routes.  Assume 75% of these 
> are unroutable (different continents, mixing of cycle paths and 
> motorways, etc.) and you have 156,250,000,000 routes.
> 
>> Almost every route is going to be something like "take street a,b,c,  
>> get on a motorway, come off and do streets d,e,f". That is, almost  
>> every route is going to be fairly simple. 
> 
> Not if you also do shortest route - these tend to be significantly 
> longer.  I've also had ideas about things like "scenic route" (weight 
> ways on scenicness based on surrounding streets, green areas, etc.), 
> "shortest route, but keep it to vaguely sane roads - i.e. 
> non-residential, non-track, etc.".  Some kind of 
> use-trains-or-flights-when-appropriate thing would rule too.
> 
>> Let us a assign L to be the  
>> length of route in number of ways traversed. Of course many routes  
>> won't exist (you can't drive across oceans).
> 
> Even assuming you do the famous Google "swim across Atlantic ocean" 
> thing, this is effectively the concatenation of two routes.
> 
>> Let us assign N to be the number of types of route, and lets say N is  
>> 3 for car, walk, cycle.
> 
> I'd guess at N=7 or more.  Car shortest, car scenic, car fastest.  Cycle 
> normal, cycle avoid-fast-roads (or something).  Walk.  The 
> use-trains-or-flights-when-appropriate thing.  Doubtless other cool 
> things that people think of.
> 
>> Let us assign B to be the the number of bytes per way id. B=2, 16  
>> bits can currently encode all ways.
> 
> Er, how?  We have more than 65535 ways - I make that you need at least 
> 22 bits as of right now (and 23 before very long).  However, since 
> you're (presumably) not going to be storing the descriptions, you need 
> to do a query on ways anyway.  By making your query return slightly more 
> data, you can significantly save on route storage.  At each fork node, 
> get all the ways which proceed from that node, sorted by way ID.  You 
> can then store routes as:
> 
> Start at node 12345
> First fork node: take way with the lowest ID = 0
> Second fork node: take way with the third lowest ID = 2
> etc.
> 
> Assuming the most ways joining at a single node is 16 (if anything, 8 
> seems more likely), you need 4 bits per step of the route (making 16 the 
> shinier choice, since 4 bits/step fits much more nicely into bytes).
> 
>> The total disk space to store all routes is therefore
>>
>> ~= 4 * 10e6 * (L * B) * N
> 
> By my calculations thus far:
> 
> ~= 1.5625 * 10e11 * (L * 0.5) * N
> 
>> I'm going to go out on a limb here, and almost any route I've got  
>> directions for is about 10 steps. Some will be more of course, but  
>> many will be much less.
> 
> I'd go with 20, bearing in mind the shortest-route thing and so on. 
> It's easy to get 50+ steps on these routes, so that seems more 
> reasonable to me.

I think we only need to store the first one. The second one is the same
as the route from that node to the destination. I.e. if the route is
A->B->C->D, then we just need to store A->D=B, then lookup B->D=C, then
C->D=D.

In many, probably most, cases, the next node to go to will be the same
no matter which type of route you are taking, so we only need to store
different routes for different types of route when they actually differ.

We could avoid storing a lot of the same data by using the quad-tile
indexes. So "from A to any node in quad-tile n use b". As the quad tiles
get further away, we can use larger ones - it is likely that I need to
take the same first node no matter where I am going in the north of the
country. We might be able to get it down to something like only a few Kb
of information per node.

When I first heard the idea and thought "Steve, 4 million^2 is a bit
more than 16 million", I thought the idea was completely bonkers. Now
this is starting to sound viable.

Robert (Jamie) Munro
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHAiy6z+aYVHdncI0RAmB+AJ9BIMPNk5e7i+LfoN2/TxxJLog1OwCeI6n4
iZiBrl72J4ktVfQAAoDwCrU=
=rny/
-----END PGP SIGNATURE-----




More information about the Routing mailing list