[OSM-talk] Tile caching (osm startpage)

Kai Krueger kakrueger at gmail.com
Sun Mar 7 08:30:10 GMT 2010


On 01/-10/-28163 08:59 PM, Michal Migurski wrote:
>
> On Mar 5, 2010, at 11:34 AM, John Smith wrote:
>
>> On 6 March 2010 01:24, Bernhard zwischenbrugger<bz at datenkueche.com>  wrote:
>>> Google Cache Time:
>>> Cache-Control: public, max-age"222222  //feels like one month (I
>>> didn't calculate)
>>
>> I'd say it's a bad idea to specify a cache time, instead there is
>> other caching mechanisms to tell if a tile has changed:
>>
>>> ETag: "d096ddafba32c0da609007e224530ccd"
>>
>> This way if a tile never changes you never need to refresh.
>
>
> For what it's worth, the current tile server does specify a cache time as well as an ETag.
>
> % curl -sI "http://tile.openstreetmap.org/14/2627/6331.png"
> 	HTTP/1.1 200 OK
> 	Date: Sun, 07 Mar 2010 02:19:30 GMT
> 	Server: Apache/2.2.8 (Ubuntu)
> 	ETag: "93087c5713c17d9939cac9e341fdd14c"
> 	Content-Length: 26595
> 	Cache-Control: max-age36
> 	Expires: Sun, 07 Mar 2010 02:36:46 GMT
> 	Content-Type: image/png
>
> 1,000 sec. max age there is a little over 15 minutes, though when I repeat this request I get expiry times all over the place, from a few minutes to many hours. What currently decides on the cache expiration time?

mod_tile, the apache module used to server the tiles, has a fairly 
sophisticated mechanism to decided the expiry times, driven by a bunch 
of heuristics. As with the minutely rendering, we don't have a periodic 
update cycle anymore, there is no real good way of setting the expiry 
times, as one would need to guess when in the future this tile might 
change. As that is obviously not possible, we need to trade off between 
caching time (reducing server resources and client side latency) and 
up-to-dateness to not loose the benefits of the minutely updates.

The heuristics currently supported (and used) are the following.

At a first instance it decides if the tile is known to be "dirty" i.e. 
outdated. If the tile server is overloaded, or the rendering takes 
longer than 3 seconds, mod_tile will serve an old tile rather than wait 
until the on-the-fly rendering will finish. (Again a trade-off between 
client side latency and up-to-dateness) At that point, given that we 
know the tile will soon change, the max-age cache parameter is set very 
low. 15 minutes + a 7 minute random jitter.

If the tile served is not stale, there are another 3 heuristics
A zoom level based heuristic, a last modified heuristic and a known 
planet update cycle if it exists.

The zoom level based heuristic allows to set the minimum max-age caching 
time based on if the tile served is a low zoom, medium zoom or high zoom 
tile. The idea behind this is that low zoom tiles (even though they are 
effected by all changes) don't appear to change much. Thus it seems 
reasonable to allow clients to cache these much longer as the effect of 
a stale tile from cache is probably less.

The current setup of tile.osm.org, I think, doesn't use this heuristic 
though and setts the minimum max-age caching to 3 hours + 3 hours random 
jitter for all zoom levels, even though the minutely tile expiry doesn't 
actually expire low zoom tiles and thus only change if manually 
requested. So I think it would be good to increase the time to cache low 
zoom tiles, as in the current setup it shouldn't affect things 
negatively.

The last modified heuristic tries to guess how likely it is for a tile 
to change. E.g. a tile in the middle of the pacific is probably not 
going to change anytime soon. So it wouldn't matter to give e.g. a 
max-age of a week. A tile perhaps in central Berlin is more likely to 
change. So the heuristic guesses how likely it is to change in the 
future based on how long it has been since it last changed. It then 
specifies a linear scaling of max-age to last modified time with a 
tunable slope parameter. As it is fairly unclear how well this heuristic 
works, I believe the osm tile server still has this at its default, i.e. 
turned off completely.

The last "heuristic", is that based on planet update cycles. For those 
servers that have a planet update cycle (i.e. not tile.osm.org), you 
don't have to guess and can just set the expiry time to when the next 
update cycle begins. This is the most efficient from a caching point of 
view, but doesn't work with minutely updates.

The final max-age handed out by the server for clean tiles is then the 
maximum time of any of the 3 heuristics capped to a week.

The random jitter factor is there mostly for if you have weekly update 
cycles, to not expire all tiles at exactly the same time and then 
overwhelm your tile server when suddenly all cached tiles expire.

Since a couple of hours, the mod_tile code would now also support a tile 
expiry based on hostname header, so it would theoretically be possible 
to do something like cache.tile.osm.org handing out expiry headers of 
e.g. a month. But it isn't clear how one would decided who to send to a 
hypothetical cache.tile and who to the normal tile server. It is also 
not clear what it would do to osmf's own (currently still relatively 
limited) caching, as it would now require two copies of each tile being 
kept by the accelerator caches, doupling the required resources. So I am 
not sure if or in what form this would potentially happen, even though I 
do think it is a good idea from the client perspective.


Cutting it short, the current tile.osm.org server basically hands out 
expiry times of 15-22 minutes for stale tiles and 3 - 6 hours for clean 
tiles with a bunch of more parameters that could be tuned.



>
> The Phnom Penh issue all sounds like a job for a CDN like Akamai's or a caching proxy (i.e. squid-cache.org) closer to Cambodia. Bernhard, these are not difficult to set up for yourself if you are interested, and require little knowledge of the actual map.

Having a CDN would definitely help and would probably be indeed the 
preferred option in this specific case. But it would require osmf having 
hosting facilities in various countries. Great, if it were possible, but 
I am not sure if it is at the moment.

Since a few days, there is a trial to see how well a CDN / caching proxy 
would work in our setup with a.tile.osm.org redirecting to a simple 
proxy server at a different hoster (although in London, too). It is too 
early to say much yet, but it does seem like the cache hit ratios are 
lower than I would have hoped them to be with only about 40 - 60% of 
request successfully being served by the proxy without needing to 
contact the main server. ( 
http://munin.openstreetmap.org/openstreetmap/konqi.openstreetmap.html#Squid 
for reference )

We will need to see how this all pans out, but I would guess it will 
depend on resources donated to osmf to make some of this happen and 
ensure that the tile serving infrastructure can be expanded in the future.

Kai

>
> -mike.
>
> ----------------------------------------------------------------
> michal migurski- mike at stamen.com
>                   415.558.1610
>
>
>
>
>





More information about the talk mailing list