[OSM-dev] Tile generation performance

Lars Aronsson lars at aronsson.se
Thu Jul 20 22:44:27 BST 2006


Since a couple of years, I'm running a cron script that has a list 
of URLs and polls them just to measure the response time and get 
statistics on server uptime/downtime.  This script runs every 2 
hours = 12 times a day = 84 times a week.  However, the script 
doesn't work well for websites that run Squid caches, since it 
retrieves the same URL again 2 hours later, and will always get a 
cached copy.  So last week I modified it to cleverly alter the URL 
it requests from the OSM tile generator, and now I've just got the 
first 100 samples (8 days).

The URLs are for a single tile at zoom=12 from the Stockholm 
region.  Some URLs might have lots of streets, others have none.  
The returned tiles have sizes varying from 1.8 to 4.5 kbytes. 
Beforehand I guessed that the response time would vary a lot with 
the time of day, but I haven't seen any of that.  All requests 
have received a HTTP status 200 (OK), none have returned errors or 
timed out.  The servers had an excellent availability this week.

The ping roundtrip time must be taken into account, since no 
signals can travel faster than light through an optic fiber.  The 
script runs from Linköping, Sweden.  A simple ping to 
wiki.openstreetmap.org has a response time of 34 milliseconds.  
When I retrieve the OSM logo (images/mag_map-120x120.png), a 
static image file of 24 kbytes, response times range from 0.18 to 
0.98 seconds with an average of 0.39 seconds.

None of the tile calls were faster than 1.94 seconds.  Half of 
them were faster than 2.65 seconds, 80% faster than 4.25 seconds, 
90% faster than 7.44 seconds.  Only 4% were slower than 10 
seconds.

While 4.25 or 10 seconds are disasters in themselves, my 
impression -- based on experience from other applications -- is 
that this spread is very low.  Half of all tiles are generated in 
a narrow 1.94 to 2.65 second span.  That means we have a tile 
generator that runs as reliably and predictably as a diesel 
engine.  The very few exceptionally slow cases can probably be 
helped by just adding more hardware.  The *variation* in response 
time is not our problem.

The real surprise -- and a disaster for the user experience -- is 
that no tiles are generated faster than 1.94 seconds.  This is 
just as absurd as if the tile generation script started with a 
sleep(1.7) call.  That is 50 times longer than the ping roundtrip.

So how can we find out what part of tile generation takes these 
two seconds?  Is it some unnecessary initialization that is called 
over and over again, for every new tile?  Does it do a DNS lookup 
that fails and times out after 1.7 seconds, after which it 
continues to do the real work?  Or does any of the SQL calls by 
accident cause a full table scan instead of using an index?  Is 
there a hidden sleep() call?


-- 
  Lars Aronsson (lars at aronsson.se)
  Aronsson Datateknik - http://aronsson.se




More information about the dev mailing list