[Tile-serving] [openstreetmap/mod_tile] Update render queue limits (#152)

Sun Dec 3 18:56:35 UTC 2017

Thought I'd throw in some clarifying remarks.

Assuming this information is still accurate, I haven't looked at the code in quite a while so I don't know if things have changed in this respect.

Priority Request Queue, Request Queue, Low Priority Request Queue are all meant for on the fly rendering. i.e. if a user requests a tile in mod_tile and it is out of date in some way or another, mod_tile delays serving that tile while it tries to render it. So these renderings are "time critical" as they delay the serving of a tile. Priority Request Queue thereby handles tiles that have no stale cached tile. I.e. if the on the fly rendering fails, mod_tile will deliver a 404 tile not found response. That is why this is the highest priority. Request Queue servers tiles that have a cached stale tile for which we know the data has changed, so if mod_tile ends up serving the cached version as renderd doesn't render the tile in time, the user will see "wrong"/"old" geometry/features. Low Priority Queue is for when the style sheet has changed, but not the data it self.

As all of these queues are designed for on the fly rendering, their size is limited to 32 entries. It is not reasonable to believe renderd will render more than 32 metatiles in the time before mod_tile times out and serves either a stale or 404 tile. So there is no point in having a longer queue there, as that would just add to the starvation issue.

If a tile tries to be added to lets say the request queue, but the queue is full, it spills over into the dirty queue, which is the "large batch background" rendering queue which isn't "time critical" anymore, as there is no user directly waiting for the answer. This queue is there to try and steady the load on the server and make sure that the server is still doing something useful during low load times.

So increasing the queue length beyond 2 minutes definitely seems good. However, there are definitely a couple of issue with increasing the dirty queue size. As as been pointed out, the dirty queue can easily be starved out. Now if during a burst of high load the priority queue with missing tiles overflow, some of those missing tiles request end up in the dirty queue. If the load then drops a little so that the priority queue is no longer full, but e.g. the low priority queue is still saturated, then renderd will only render those style sheet changed tiles, but never get around to rendering the missing tile that has dropped into the dirty queue. If a new user now tries to look at that dirty tile, mod_tile will submit it to renderd again, but as it is already queued (in the dirty queue) renderd won't requeue it into the priority queue, even if the priority queue could now hold the request. So it is possible to starve out missing metatiles in the dirty queue, which is very bad. With a shorter dirty queue this risk of starving important tiles out is somewhat reduced.

Furthermore, back then (not sure if that has changed by now), renderd didn't really use any library routines. So instead of using a decent quality existing implementation of a priority queue, I coded up my own implementation. And that implementation was in the category "good enough at the time", not really in the "good" category.

So ideally that hole queuing implementation gets replaced by a proper library implementation of a priory queue with re-queueing of higher priority requests and with something like an epsilon greedy dequeueing strategy to reduce starvation issues.     

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openstreetmap/mod_tile/pull/152#issuecomment-348805403
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tile-serving/attachments/20171203/9ee64009/attachment-0001.html>