[Routing] [OSM-dev] pre-compute routing
Jon Bright
jon at siliconcircus.com
Sun Sep 30 21:38:22 BST 2007
Hi,
Steve Coast wrote:
> (please reply to the new routing list, sent to dev so that you know
> it's there)
>
> We have about 4 million ways.
...but we don't need to route between them all. The thing to route
between, I think, would be "fork" nodes - i.e. nodes where more than two
ways join. I have a query running to find these (as the first stage to
producing more routing-optimised data). At the start and end of the
route, assuming the starting/ending node isn't itself a fork node, you
end up with a choice of (at most) four routes (from the next fork node
in each direction on the start and end ways) and pick the
shortest/fastest best of those four.
My query isn't finished yet (I have ideas for a much faster version, but
I want this one to get finished first). But I'm going to go out on a
limb and say that we have about a million fork nodes.
> An all pairs routing graph has therefore 16 million entries. We
> assume a route from a to b may be different than from b to a due to
> one way streets, turn restrictions etc.
I think your maths here is wrong (if I'm wrong, a lot of the rest of
this mail is going to be irrelevant). If I had 4 nodes I was routing
between (and assuming I respect one-way restrictions), I'd have
1->2, 1->3, 1->4, 2->1, 2->3, 2->4, 3->1, 3->2, 3->4, 4->1, 4->2, 4->3
= 12 routes. Basically, if I have N nodes, I have N*(N-1) routes. For
your figure of 4 million, I end up with 16 trillion routes. This enters
the realm of the unrealistic storagewise (at least for OSM, probably not
for Google). I'm not sure you need to be quite so pessimistic about the
one-way stuff, though - if a route's not affected by one-way stuff, you
could just store one of the directions and use some simple logic to
retrieve that for the other direction too if the direction you're
looking for doesn't have a route.
For 4 nodes, I then have
1->2, 1->3, 1->4, 2->3, 2->4, 3->4
= 6 routes. For N nodes, (N*(N-1))/2. Assume 25% of routes have a
differing reverse direction, then we have ((N*(N-1))/2)*1.25 ->
5(N*(N-1))/8.
Going with the million-fork-node assumption from above, all-pairs would
be 5*(1000000*999999)/8 ~= 625,000,000,000 routes. Assume 75% of these
are unroutable (different continents, mixing of cycle paths and
motorways, etc.) and you have 156,250,000,000 routes.
> Almost every route is going to be something like "take street a,b,c,
> get on a motorway, come off and do streets d,e,f". That is, almost
> every route is going to be fairly simple.
Not if you also do shortest route - these tend to be significantly
longer. I've also had ideas about things like "scenic route" (weight
ways on scenicness based on surrounding streets, green areas, etc.),
"shortest route, but keep it to vaguely sane roads - i.e.
non-residential, non-track, etc.". Some kind of
use-trains-or-flights-when-appropriate thing would rule too.
> Let us a assign L to be the
> length of route in number of ways traversed. Of course many routes
> won't exist (you can't drive across oceans).
Even assuming you do the famous Google "swim across Atlantic ocean"
thing, this is effectively the concatenation of two routes.
> Let us assign N to be the number of types of route, and lets say N is
> 3 for car, walk, cycle.
I'd guess at N=7 or more. Car shortest, car scenic, car fastest. Cycle
normal, cycle avoid-fast-roads (or something). Walk. The
use-trains-or-flights-when-appropriate thing. Doubtless other cool
things that people think of.
> Let us assign B to be the the number of bytes per way id. B=2, 16
> bits can currently encode all ways.
Er, how? We have more than 65535 ways - I make that you need at least
22 bits as of right now (and 23 before very long). However, since
you're (presumably) not going to be storing the descriptions, you need
to do a query on ways anyway. By making your query return slightly more
data, you can significantly save on route storage. At each fork node,
get all the ways which proceed from that node, sorted by way ID. You
can then store routes as:
Start at node 12345
First fork node: take way with the lowest ID = 0
Second fork node: take way with the third lowest ID = 2
etc.
Assuming the most ways joining at a single node is 16 (if anything, 8
seems more likely), you need 4 bits per step of the route (making 16 the
shinier choice, since 4 bits/step fits much more nicely into bytes).
> The total disk space to store all routes is therefore
>
> ~= 4 * 10e6 * (L * B) * N
By my calculations thus far:
~= 1.5625 * 10e11 * (L * 0.5) * N
> I'm going to go out on a limb here, and almost any route I've got
> directions for is about 10 steps. Some will be more of course, but
> many will be much less.
I'd go with 20, bearing in mind the shortest-route thing and so on.
It's easy to get 50+ steps on these routes, so that seems more
reasonable to me.
> ~= 4 * 10e6 * (10 * 2) * 3 ~= 240 million bytes.
~= 1.5625 * 10e11 * (20 * 0.5) * 7
~= 10,937,500,000,000 bytes
~= 10,430,813 MB
~= 10,186 GB
~= 10TB
> Anyway the key point is we can pre-compute all routes and serve them
> from one box fairly easily if we want to. Even if you add an order of
> magnitude complexity, it's doable.
With 10TB - its doable, but I think we can drop "easy"... You could cut
it down to 1 type of route from 7 for starters, getting you down to a
couple of TB, but it's still pretty big.
> This, I'm told, is exactly what Google do. But their dataset is a bit
> bigger.
Google have lots of 10TBs, I'm guessing OSM doesn't (yet) have one 10TB...
But! All is not lost. Almost all of that incredible number of routes
will never be asked for. So you can probably cache every route anyone
ever asks for.
Looking back to the precalculation goal, though: 99,9% of the routes you
have will be A->B->the entirety of some other route->C->D. You could
store a "jump" to another route as being "take the 16th highest way ID"
followed by a route ID. You can recognize when to do this jumping by
doing some variant of Longest Common Subsequence matching whenever you
generate a new route. Unfortunately, LCS is itself NP-complete and
CPU-intensive.
Long story short: I think it *might* be possible to precalculate stuff.
But I think if you go down that road, you need to be using every trick
in the book to save space (or getting someone to donate you a very large
--- and growing --- storage array). Assuming you're saving space, this
will extend the time you spend precalculating. Which means you need to
be thinking of something more like a monthly or two-monthly update cycle
for the routing DB. I don't know how good a match this is to OSM, at
least at its current stage of life.
--
Jon
More information about the Routing
mailing list