[Tilesathome] Thoughts about smoothing the load on dev
Jon Schlueter
jonschl at iriscorp.org
Wed Jun 6 12:21:09 BST 2007
On 6/6/07, Martijn van Oosterhout <kleptog at gmail.com> wrote:
> On 6/6/07, Sebastian Spaeth <Sebastian at sspaeth.de> wrote:
> > Martijn van Oosterhout wrote:
> > > 1. Currently the backoff time doubles every time an upload fails, but
> > > goes to zero when it succeeds. This causes a "thundering herd" problem
> > > as soon as the load drops below three. I propose we change the
> > > calculation to cut the failure count in half on successful upload
> > > rather than setting it to zero.
> >
> > Yes, this is what an exponential backoff obviously should do. If it
> > isn't implemented already, I'm all for doing it. No need to make it more
> > complex with random variation and stuff. *2 and /2 should work just fine.
>
> Someone is working on this.
Attached is a patch which does that, as well as add a Conditional
Delay function into tilesGen.pl which uses the same count to delay and
scale back on server intensive functions.
tilesGen.pl Server bottlenecking Issues
in tilesGen loop mode it currently just spins as fast as possible, if
there are requests in the queue to be downloaded it pulls it and
starts working on it... this is even if it just had multiple failures
on trying to upload files which failed.... This just continues to add
to the load on the server since requesting another tile to render and
pulling the mapping data from the database as well sometimes also
results in
failed downloads of map data which shows up like this on my client
[#21 0% jobinit] Doing tileset 2160,1287 (area around 55.403992,9.887695)
[#21 0% Preproc] Downloading: Map data to data-19232.osm... No data
at this location
Trying smaller slices...
[#21 0% Preproc] Downloading: Map data to data-19232-7.osm... No
data here either
which ends up with a false blank tile being generated and uploaded....
Another issue is that if the queue of requests is empty a while, it
leaves clients just setting in idle with a small number of tiles never
uploaded...
This patch addresses several bottle neck issues and when the server is
busy busy busy it will scale back tilesGen.pl loop clients, but when
the server is running just fine it will add 0 overhead delay
Thoughts, comments, should we try pushing it out and see what effect it has?
> > > 2. Currently the only way clients get feedback that the server is busy
> > > is when an upload fails. I propose that the server accept uploads but
> > > return an extra code when the load is high (say >2.5) to tell the
> > > client to start backing off before all goes crazy.
> >
> > Sounds good to me. It is a just a different HTTP return code you were
> > thinking of?
>
> Yes, I can't find an appropriate status code to reuse, so I propose
> something like "299 Accepted, Slow down". It should be backward
> compatable with current clients. And if this code is received, don't
> double the count, instead do "$failurecount += 0.1" (I also propose
> renaming the variable to something other than failurecount, but
> anyway....).
That would add in nicely with this patch as well....
-------------- next part --------------
Index: tilesGen.pl
===================================================================
--- tilesGen.pl (revision 3113)
+++ tilesGen.pl (working copy)
@@ -139,6 +139,16 @@
while(1)
{
reExecIfRequired();
+ # add in a delay before getting more work from dev server if we already are having trouble uploading files
+ delayOnFailure();
+
+ # conditionally try to upload all failed tiles even when there are no new tiles being rendered....
+ if(getIdle(1) > 300)
+ {
+ upload();
+ }
+ uploadIfEnoughTiles();
+
my ($did_something, $message) = ProcessRequestsFromServer();
uploadIfEnoughTiles();
if ($did_something == 0)
@@ -190,6 +200,32 @@
print "\nGNU General Public license, version 2 or later\n$Bar\n";
}
+sub delayOnFailure()
+{
+ my $failures;
+ my $failFile = $Config{WorkingDirectory} . "/failurecount.txt";
+ if (open(FAILFILE, "<", $failFile))
+ {
+ $failures = <FAILFILE>;
+ chomp $failures;
+ close FAILFILE;
+ }
+ elsif (open(FAILFILE, ">", $failFile))
+ {
+ $failures = 0;
+ print FAILFILE $failures;
+ close FAILFILE;
+ }
+
+ # sleep for 2, 4, 8, 16... seconds for each consecutive failure to a max of 1 hours (3600 seconds)
+ if ($failures)
+ {
+ my $sleepdelay=($failures > 14) ? 3600 : ((2 ** $failures)/4);
+ statusMessage("Sleeping due to upload failures[".$failures."] sleeping for " . $sleepdelay . " seconds", $Config{Verbose}, $currentSubTask, $progressJobs, $progressPercent,0);
+ sleep ($sleepdelay);
+ }
+}
+
sub uploadIfEnoughTiles
{
my $Count = 0;
@@ -392,6 +428,7 @@
if (-s $DataFile == 0)
{
+ delayOnFailure();
printf("No data at this location\n");
printf("Trying smaller slices...\n");
@@ -401,6 +438,7 @@
for (my $i = 0 ; $i<10 ; $i++)
{
+ delayOnFailure();
$URL = sprintf("http://%s:%s\@www.openstreetmap.org/api/0.4/map?bbox=%f,%f,%f,%f",
$Config{OsmUsername}, $Config{OsmPassword}, ($W1+($slice*$i)), $S1, ($W1+($slice*($i+1))), $N1);
my $partialFile = "data-$PID-$i.osm";
Index: upload.pl
===================================================================
--- upload.pl (revision 3113)
+++ upload.pl (working copy)
@@ -123,7 +123,12 @@
{
if (upload("$ZipDir/$File"))
{
- $failures=0;
+ $failures-=1;
+ $failures/=2;
+ if($failures < 0)
+ {
+ $failures = 0;
+ }
}
else
{
@@ -142,6 +147,11 @@
statusMessage($failures . " consecutive upload failures, sleeping for " . $sleepdelay . " seconds", $Config{Verbose}, $currentSubTask, $progressJobs, $progressPercent,0);
sleep ($sleepdelay);
}
+ if (open(FAILFILE, ">", $failFile))
+ {
+ print FAILFILE $failures;
+ close FAILFILE;
+ }
}
}
More information about the Tilesathome
mailing list