[OSM-dev] Tile generation performance
lars at aronsson.se
Thu Jul 20 22:44:27 BST 2006
Since a couple of years, I'm running a cron script that has a list
of URLs and polls them just to measure the response time and get
statistics on server uptime/downtime. This script runs every 2
hours = 12 times a day = 84 times a week. However, the script
doesn't work well for websites that run Squid caches, since it
retrieves the same URL again 2 hours later, and will always get a
cached copy. So last week I modified it to cleverly alter the URL
it requests from the OSM tile generator, and now I've just got the
first 100 samples (8 days).
The URLs are for a single tile at zoom=12 from the Stockholm
region. Some URLs might have lots of streets, others have none.
The returned tiles have sizes varying from 1.8 to 4.5 kbytes.
Beforehand I guessed that the response time would vary a lot with
the time of day, but I haven't seen any of that. All requests
have received a HTTP status 200 (OK), none have returned errors or
timed out. The servers had an excellent availability this week.
The ping roundtrip time must be taken into account, since no
signals can travel faster than light through an optic fiber. The
script runs from Linköping, Sweden. A simple ping to
wiki.openstreetmap.org has a response time of 34 milliseconds.
When I retrieve the OSM logo (images/mag_map-120x120.png), a
static image file of 24 kbytes, response times range from 0.18 to
0.98 seconds with an average of 0.39 seconds.
None of the tile calls were faster than 1.94 seconds. Half of
them were faster than 2.65 seconds, 80% faster than 4.25 seconds,
90% faster than 7.44 seconds. Only 4% were slower than 10
While 4.25 or 10 seconds are disasters in themselves, my
impression -- based on experience from other applications -- is
that this spread is very low. Half of all tiles are generated in
a narrow 1.94 to 2.65 second span. That means we have a tile
generator that runs as reliably and predictably as a diesel
engine. The very few exceptionally slow cases can probably be
helped by just adding more hardware. The *variation* in response
time is not our problem.
The real surprise -- and a disaster for the user experience -- is
that no tiles are generated faster than 1.94 seconds. This is
just as absurd as if the tile generation script started with a
sleep(1.7) call. That is 50 times longer than the ping roundtrip.
So how can we find out what part of tile generation takes these
two seconds? Is it some unnecessary initialization that is called
over and over again, for every new tile? Does it do a DNS lookup
that fails and times out after 1.7 seconds, after which it
continues to do the real work? Or does any of the SQL calls by
accident cause a full table scan instead of using an index? Is
there a hidden sleep() call?
Lars Aronsson (lars at aronsson.se)
Aronsson Datateknik - http://aronsson.se
More information about the dev