[Strategic] Routing

Sun Mar 6 13:18:12 GMT 2011

Strategic,

    after having read the minutes of the latest IRC discussion, I feel 
that a few basic points about routing have to be said as not everyone 
seems to be clear about them.

1. What do we mean by "doing routing"?
--------------------------------------

* Some people read this as "we run a routing engine on OSMF infrastructure";

* some people read this as "we offer a 'find directions' service on the 
www.openstreetmap.org web site".

It is important to note the these two are independent of each other; we 
have the following options:

* neither operate a routing engine nor offer directions on the web site 
(instead possibly place a link on the web site that says "go to this 
excellent MapQuest site if you want directions")

* offer directions on the web site but not operate a routing engine 
(e.g. by using MapQuest API, or even allowing the user to choose from 
different external offers)

* operate a routing engine but not offer directions on the web site 
(e.g. because we view the routing engine as an internal service we run 
for our mappers so that they may find bugs better)

* offer directions on the web site *and* operate a routing engine 
(likely the web site directions would then be powered by that engine, 
but it does not have to be so).

2. Reasons for and against
--------------------------

The following are some obvious reasons for and against each of the 
options listed above:

* Directions on web site

- for: makes web site more attractive; shows third parties that OSM is 
more than just maps
- against: unclear if we want to be in the business of building cool 
end-user facing applications; on its own offers little value-add for 
mappers; need to invest work to integrate; danger of attracting 
low-quality bug reports

* Operating own routing server

- for: allows the OSM community to define the rules used to build the 
routing graph; allows different uses than just route from A to B
- against: needs work to set up and maintain, needs server 
infrastructure that costs real money

I am sure there are more reasons than these.

3. Routing for quality improvement
----------------------------------

I have the impression that few members of strategic understand what is 
meant be the concept of "having a routing engine could help us improve 
quality".

I can think of at least three things here.

a) Just by "playing" around with a trivial routing interface like the 
one I set up on routingdemo.geofabrik.de (only fastest automobile 
routes; no turn restrictions; no textual result descriptions) people can 
and will find and fix errors in the map. To prove this point, see 
yesterday's postings on the talk-ca mailing list after I had enabled 
routing for Canada on that web site. That simple and playful process 
already finds wrong oneway bits, connectivity problems and so on.

We do not need to operate our own routing server to have these benefits, 
but it certainly helps; to my knowledge, none of the existing servers 
that offer world-wide routing based on OSM data is open source (so we'll 
never know how the router computes what it does, and can only guess what 
idiosyncrasies in our data lead to a certain result), and since we don't 
operate them we're some degrees removed from when and how often data 
gets updated etc. - as it were in the concrete Canada example, I could 
simply run a manual data update after some problems had been fixed so 
that mappers could see the results of their work quickly.

b) A very important bit of any routing engine is the extraction of the 
routing graph from OSM data, i.e. the interpretation of OSM data for 
routing. This starts with simple questions like "how fast can we assume 
to travel on a motorway in Bolivia", but includes also grey areas like 
"can the routing engine instruct a motor vehicle to make a 165 degree 
turn", or "can the routing engine assume that a pedestrian can cross a 
road of type <X> at will", and "what kind of instructions need to be 
issued for an intersection with lots of little connecting lanes".

At the moment, everyone who writes a routing engine has to think about 
these cases and write code for them, and everyone does it differently. 
This means that there are no clear rules on how to interpret OSM data, 
and in consequence mappers are not clear about what a router will do if 
they map something in a certain way. We can assume (or hope?) that if we 
have a routing engine that we control, then that will aid in 
quasi-standards forming, just like the common map that we have aids in 
forming tagging standards. People aren't expected to "tag for the 
router" just as they don't "tag for there renderer" now, but still the 
existence of a project-wide routing engine that interprets our data in a 
way that can be influenced by the community would probably help a lot in 
shaping up our data for routing *and* developing a basic standard for 
interpreting it.

c) I can also think of a number of automated quality checks run on top 
of a routing engine (for example creating distance matrices). These 
could use a standard API (and as such also use a service operated by 
someone else), but having our own routing engine would allow us to make 
bulk queries more efficient. For example, the routing engine on 
routingdemo.geofabrik.de takes less than a millisecond to compute, but 
more than 100 milliseconds until it arrives at the client through the 
API (because of the creation and parsing of XML messages, gzip 
compression, and network latency). So if you have the option of 
accessing the routing engine directly (which we could at least in theory 
provide were we running our own), such things can be done much more 
efficently.

Personally I find these points pretty convincing and reason enough to 
want a project-wide proper routing server but then again it is not 
something that OSMF necessarily has to do - it could also be done for us 
by someone else, a sponsor perhaps.

4. Choice of routing engine
---------------------------

If we are after directions on our web site and do not want to run our 
own routing engine then we could simply add any and all routing engines 
to our web site; no need for us to spend time choosing a specific one.

If we want to run our own server then, as far as I can see, there are 
currently only two Open Source products that can be run on the whole 
planet. One is Nic Roet's gosmore engine (PD license), and the other is 
the Contract Hierarchies implementation from Karlsruhe University (AGPL 
license).

Nic Roets is a bright guy and a genius programmer. He created Gosmore at 
a time when few of us thought routing on OSM was even possible, and he 
deserves credit for that. The Uni Karlsruhe algorithm has won 
competitions, is actively maintained by staff, and is the subject of a 
number of academic papers. Although both are Open Source, neither have 
until now attracted much contribution from the OSM community. I'll try 
to spare you the technical details but because of the algorithm used, 
Gosmore can offer more flexibility (fastest/shortest/bike/pedestrian/HGV 
etc) than the CH implementation; on the other hand the CH implementation 
is very fast (can do something like 1000 queries a minute) and Gosmore 
is slower by _at least_ an order of magnitude. Both algorithms also 
require preprocessing the data, a step that takes time and resources.

For any sort of mass-processing, my money would be on the Uni Karlsruhe 
software because it is so much faster. (It is also the software on 
routingdemo.geofabrik.de, and an earlier version of the same code has 
been used by Cloudmade when they launched their routing.) I also believe 
that if we were to offer directions on the web site, Gosmore alone 
wouldn't be fast enough to handle our number of visitors. However we 
might choose to run a multi-backend routing where we have one user 
interface (included in the rails port) that can talk to different 
backends, e.g. it uses the fast Uni Karlsruhe server as long as you just 
want fastest route from A to B, and switches to Gosmore when you ask for 
the scenic wheelchair-suitable route or so).

Any such decision would however require a thorough analysis of 
infrastructure required, to answer questions like "how many servers of 
what configuration do we need", "how often could we update the data", 
"how many queries can we process". Running two routing engines would not 
necessarily double the amount of hardware required because preprocessing 
for both could be done alternatingly.

5. Offering an API
------------------

Should we run our own routing server, we can choose to offer a public 
API or not. Not offering an API at all would reduce the positive effects 
we get from our community members using the server in "new and 
unexpected ways"; offering a free-for-all API would probably attract 
lots of freeloaders who code it into their iPhone apps and so on which 
we don't want either. We could probably find technical measures to make 
the API usable only for people with an OSM account, or simply write a 
policy that clearly asks people to use MapQuest etc. for anything not 
OSM-related.

My own position in all this is a slight "I think we should have our own 
routing server" and I'm ambivalent on whether or not we should put it on 
the web site. But I can see the reasons against (not a core service, 
costs money). The one thing I would have difficulty understanding is if 
we went for routing on the web site but no engine of our own - that 
would be all show and no substance in my eyes.

Bye
Frederik

-- 
Frederik Ramm        www.geofabrik.de
Geofabrik GmbH       Handelsregister: HRB Mannheim 703657
Scheffelstr. 17a     Geschaeftsfuehrung: Frederik Ramm
76135 Karlsruhe      Tel: 0721-1803560-0
ramm at geofabrik.de    Fax: 0721-1803560-9