[Tilesathome] [server]

Tue Apr 6 19:19:36 BST 2010

Florian Lohoff schrieb:
> On Sun, Mar 28, 2010 at 11:04:04AM +0200, Patrick Kilian wrote:
>> But in the end we have to accept that the ~110 active clients (many of
>> them multicore and quite fast) can render batches of simple prio4 tiles
>> which are expired by my oldtile script or the oldtile checker faster
>> then the server can handle the upload. And no removal of code checking
>> for that condition is going to fix it. All we (might) get is a hung
>> server drowned in upload requests.
> 
> As i already mentioned i think a year ago - When looking at the load
> of my renderers it seems very "spiky" - so when we have the hourly?
> render prio 2 stuff coming in the clients start rendering - as soon
> as the prio 2 are done we come back to the prio4 and lots of clients
> stall because of the incoming queue.

I'l have to have a look which interval the queue is checked, but I
believe the api call we use to determine changed tiles has hourly
granularity, so one hour would be the minimum interval we can handle
with the current codebase.

> Here is a munin of a 8 Core 16GB Renderer sitting idle >50% of the
> time:
> 
> https://hydra.gt.owl.de/munin/lab.rfc822.org/bs4.lab.rfc822.org-cpu.html
> 
> In the end it comes to trying to equally load the clients more instead
> of the spiky cpu usage, and to equally load the server.
> 
> I would see 2 solutions/optimizations - let the prio2 rerender
> stuff run much more often - like depend on the minutely replication
> diffs. So we sprinkel prio2 into the queue much more often so that
> halve of the clients render prio2 and half of them render the oldtile
> stuff all the time.

I have watched queue load of the server and it doesn't seem to drain
with those settings on the clients, so basically the only problem is the
bursts at the clients. (Is that really a problem as long as the
serverside upload queue doesn't run empty?)

There used to be a better algorithm on the client which did exponential
backoff, but that was removed. I believe it was because the server was
too fast for all the clients at that time anyway.

> 
> Another would be to make it not a prio1 then prio2 then
> prio3 4 ... but rather a kind of "weighted fair queue". So once we have
> a bunch of prio1/2 in the queue we not start swamping the clients with them
> but rather try to feed some clients prio4 which come back fast (typically)
> and let the incoming not drain.