[OSM-dev] Too many slow queries in db

Tom Hughes tom at compton.nu
Tue Sep 4 11:21:03 BST 2007

In message <46DD2980.5050908 at siliconcircus.com>
        Jon Bright <jon at siliconcircus.com> wrote:

> Tom Hughes wrote:
>> Seriously, if you've got a magic way to make a database query
>> a 100 million row table in the blink of an eyelid I'll be only
>> to happy to deploy it.
> We have a 3.4 billion row table and various other tables ranging from
> 500 million rows to 2 billion or so, all running under MySQL using
> MyISAM.  Queries on the 3.4 billion row table run in more or less the
> blink of an eyelid (test queries range from 0.03 to 1.5s).  In another
> more insert-heavy table (3 insert/second more or less continuously,
> peaking somewhere around 20 inserts/second) albeit with only 20-odd
> million rows, the same is true.  This is all commodity hardware (if
> mildly expensive commodity hardware).

You've probably got an index that lets you go straight to the right
records though. Our problem is that we have three tables - the two
node tables and the GPS point table where most queries need to be
resolved in two dimensions to find records in a given bounding box.

A linear index on lat or lon is not much help as it means working
with a stripe around the whole world in one axis or the other.

I hope that indexing using some of the ideas that somebody has helpfully
put on the wiki at http://wiki.openstreetmap.org/index.php/QuadTiles
will help make the index selections work much better.

> None of this helps you, of course.  But in order to be able to help,
> it would be necessary to see more details of your query load.  In
> particular, is the slow query log available to look at somewhere?  Are
> the mysql config and show status outputs at
> http://wiki.openstreetmap.org/index.php/Database still current?

The slow queries will essentially all be along the lines of:

  SELECT ... FROM current_nodes
  WHERE latitude BETWEEN ... AND ...
    AND longitude BETWEEN ... AND ...

and things like this:

  SELECT ... FROM gps_points
  WHERE latitude BETWEEN ... AND ...
    AND longitude BETWEEN ... AND ...
  ORDER BY timestamp
  LIMIT 10000, 5000

A secondary problem with the gps_points one is that the table is a
MyISAM table so somebody starts querying it (which may take several
minutes) and holds a read lock, then the GPS import daemon queues for
a write lock and other readers come along and block for a read lock
behind that pending write lock.

Pretty soon every one of the API daemons is sat waiting for a read
lock on the table in order to answer a request for GPS points in a
given area and the whole API grinds to a halt.

This is not helped by the fact the API will only fetch 5000 points
at a time (for server memory reasons) and therefore each future page
of results has to read through n times 5000 records to find the 5000
that it wants (with a sort into a temporary table first of course!).


Tom Hughes (tom at compton.nu)

More information about the dev mailing list