[Tilesathome] Distributed server idea

Thu Jul 19 12:02:23 BST 2007

Hi,

> I sketched an idea, and put notes at:
> http://wiki.openstreetmap.org/index.php/Tiles%40home/ 
> Distributed_Server

I heard a talk by Richard Stallman once in which he said that the  
idea of user accounts and passwords was inherently fascist.

While I'd say that that was a bit over the top, I still detect a  
desire to "control" in the whole GPG key business here. The reason  
for this is mistrust vis-a-vis the participants in the project, which  
I believe is inappropriate and hurting the climate.

I do understand that such measures have to be put in place where  
system security is at risk, e.g. when you download packages from a  
Debian repository you really want to make sure the package is from a  
trusted source.

But we're talking about map images here, and the worst thing that can  
happen is someone defacing the map, which can be rectified quickly.  
It is like a public forum where there's a danger of people posting  
porn or inappropriate texts. It is like Wikipedia where you can  
replace an article by a porn image anytime. That can happen, and will  
happen, but there's no reason to build a whole control and  
authentication and verification structure just because of that. (Or,  
at least, we could leave the decision on how "fascist" they want to  
be with the tile servers because they are in charge, legally.)

Regarding distribution, yes, DNS is a cool idea.

The weak point with any distribution is knowing where to get the data  
from. You can either have the client request data from one specific  
server, and have the server act as a proxy/cache (this is what you  
would do if you had a proper server farm at one location), or you can  
have the server return HTTP redirects to where the content actually  
is (makes more sense in a world-wide distributed environment).

DNS has a proven record of good, fast database lookups. A DNS lookup  
is much, much faster than a HTTP server connection that get answered  
with a redirect message. So this is good. In detail it would probably  
look like this:

If renderer X renders tileset 1234,2345 and uploads to server Y, with  
Y being just any simple Apache setup anyone can install, then Y would  
afterwards notify a central DNS server that it now has this tileset.  
The central DNS server would from then on return Y's IP address for  
any queries like "1234-2345.tiles.openstreetmap.org". The clients  
(i.e. OpenLayers) would then have to use the proper host names for  
every tile, so whenever they request a tile that is within tileset  
1234,2345 they will request e.g. http:// 
1234-2345.tiles.openstreetmap.org/tiles/13/2468/4690.png, which  
automatically resolves to server Y.

This requires OpenLayers et al. to put that extra arithmetic in  
(calculating the hostname of a host where the tile will be found).

To support clients that cannot use the extra logic required here, we  
could still provide a redirection service (such a service can be run  
anywhere, by anyone) that answers to the standard HTTP requests like  
http://tiles.freds-own-server.com/tiles/13/2468/4690.png, does the  
computing and then EITHER fetches and returns the tile OR returns a  
HTTP redirect.

In a world like this, we would have:

* Rendering client - renders as usual, uploads to any tile server;  
either you have your own tile server, or you use a friends's, or some  
tile server that has been set up for your area, or whatever. I think  
that for the time being we can work with configured tile servers in  
the rendering clients, and only later may we require a sort of  
"please tell me where I can upload to" mechanism.

* Tile server - accepts uploads, notifies central server that it now  
has data for tileset so-and-so (perhaps also with timestamp so that  
central server can find out who's the most current) - optionally  
mirrors full tilesets from other servers

* Central server ("Load server") - really only one big DNS server  
that contains a database with one entry per tileset, pointing to IP  
number of server carrying that tile; perhaps also records  
alternatives and does availability checks/round robin; accepts  
notifications from tile servers

* Key server - not required as it's fascist ;-) leave it up to the  
tile servers whether they want to accept upload from just any  
anonymous client or whether they want to agree on some sort of  
authentication with their peers - not our business

The main development issues here would be:

* Create a "base" tile server for people to run, ideally a Debian  
package not dependent on PHP and MySQL, just a simple CGI that  
accepts an upload and stuffs it into the file system. People will of  
course be free to run more complex tile servers with lots of bells  
and whistles, but this will serve as the entry level thingie everyone  
can use.

* Create the custom DNS server we'll need for the Load Server. This  
is definitely a C/C++ task. Needs MySQL as a backing store but must  
keep all addresses in memory. Should be doable with a sub-200 MB  
memory footprint. Cannot use bind. Database-backed DNSes exist; would  
have to check if they can be told to also have big memory cache.

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00.09' E008°23.33'