[OSM-dev] Proposed database improvement

Thu Aug 31 10:26:53 BST 2006

I have been performing various tests on osm queries and found an 
optimisation which will speed up the very queries which are generating 
our bottleneck.

The following queries are taking around 9 seconds to complete:

SELECT id, node_a, node_b, tags FROM current_segments where (node_a IN 
(<nodelist>) OR node_b IN (<nodelist>))  and visible = true

We have indexes on both node_a and node_b.

Each query will typically list 500 nodes, and be repeated as necessary 
for a given area. a 500 node query takes 9 seconds to complete. On 
investigation, the index for node b is not being used.

Solutions.

1. A compound index. This may not work given the selects are OR ed.
2. A union. This will generate duplicate lines unsorted
3. Two queries into a temporary table then select distinctrow

3 provides the exact bit-compatible output to the existing query at the 
expense of a little more complexity. This is the one I have chosen, 
although 2 may be adequate for the job, and could be faster still.

CREATE TEMPORARY TABLE
current_segments_temp LIKE current_segments; INSERT INTO
current_segments_temp SELECT * FROM current_segments where node_a IN
(<nodelist>) and visible = true; INSERT INTO current_segments_temp 
SELECT * FROM current_segments where node_b IN
(<nodelist>) and visible = true;SELECT DISTINCTROW id, node_a, node_b, 
tags FROM current_segments_temp ORDER BY id ASC'

Using this schema, we can further reduce API load by inserting the 
output of all select statements for a given list of nodes into the one 
temporary table in the single SQL connection, then performing distinctrow.

The new query takes 28 milliseconds to execute, compared to the existing 
9 seconds.