[OSM-dev] Too many slow queries in db
Tom Hughes
tom at compton.nu
Tue Sep 4 11:21:03 BST 2007
In message <46DD2980.5050908 at siliconcircus.com>
Jon Bright <jon at siliconcircus.com> wrote:
> Tom Hughes wrote:
>
>> Seriously, if you've got a magic way to make a database query
>> a 100 million row table in the blink of an eyelid I'll be only
>> to happy to deploy it.
>
> We have a 3.4 billion row table and various other tables ranging from
> 500 million rows to 2 billion or so, all running under MySQL using
> MyISAM. Queries on the 3.4 billion row table run in more or less the
> blink of an eyelid (test queries range from 0.03 to 1.5s). In another
> more insert-heavy table (3 insert/second more or less continuously,
> peaking somewhere around 20 inserts/second) albeit with only 20-odd
> million rows, the same is true. This is all commodity hardware (if
> mildly expensive commodity hardware).
You've probably got an index that lets you go straight to the right
records though. Our problem is that we have three tables - the two
node tables and the GPS point table where most queries need to be
resolved in two dimensions to find records in a given bounding box.
A linear index on lat or lon is not much help as it means working
with a stripe around the whole world in one axis or the other.
I hope that indexing using some of the ideas that somebody has helpfully
put on the wiki at http://wiki.openstreetmap.org/index.php/QuadTiles
will help make the index selections work much better.
> None of this helps you, of course. But in order to be able to help,
> it would be necessary to see more details of your query load. In
> particular, is the slow query log available to look at somewhere? Are
> the mysql config and show status outputs at
> http://wiki.openstreetmap.org/index.php/Database still current?
The slow queries will essentially all be along the lines of:
SELECT ... FROM current_nodes
WHERE latitude BETWEEN ... AND ...
AND longitude BETWEEN ... AND ...
and things like this:
SELECT ... FROM gps_points
WHERE latitude BETWEEN ... AND ...
AND longitude BETWEEN ... AND ...
ORDER BY timestamp
LIMIT 10000, 5000
A secondary problem with the gps_points one is that the table is a
MyISAM table so somebody starts querying it (which may take several
minutes) and holds a read lock, then the GPS import daemon queues for
a write lock and other readers come along and block for a read lock
behind that pending write lock.
Pretty soon every one of the API daemons is sat waiting for a read
lock on the table in order to answer a request for GPS points in a
given area and the whole API grinds to a halt.
This is not helped by the fact the API will only fetch 5000 points
at a time (for server memory reasons) and therefore each future page
of results has to read through n times 5000 records to find the 5000
that it wants (with a sort into a temporary table first of course!).
Tom
--
Tom Hughes (tom at compton.nu)
http://www.compton.nu/
More information about the dev
mailing list