[OSM-dev] Proposed database improvement
Nick Hill
nick at nickhill.co.uk
Thu Aug 31 10:26:53 BST 2006
I have been performing various tests on osm queries and found an
optimisation which will speed up the very queries which are generating
our bottleneck.
The following queries are taking around 9 seconds to complete:
SELECT id, node_a, node_b, tags FROM current_segments where (node_a IN
(<nodelist>) OR node_b IN (<nodelist>)) and visible = true
We have indexes on both node_a and node_b.
Each query will typically list 500 nodes, and be repeated as necessary
for a given area. a 500 node query takes 9 seconds to complete. On
investigation, the index for node b is not being used.
Solutions.
1. A compound index. This may not work given the selects are OR ed.
2. A union. This will generate duplicate lines unsorted
3. Two queries into a temporary table then select distinctrow
3 provides the exact bit-compatible output to the existing query at the
expense of a little more complexity. This is the one I have chosen,
although 2 may be adequate for the job, and could be faster still.
CREATE TEMPORARY TABLE
current_segments_temp LIKE current_segments; INSERT INTO
current_segments_temp SELECT * FROM current_segments where node_a IN
(<nodelist>) and visible = true; INSERT INTO current_segments_temp
SELECT * FROM current_segments where node_b IN
(<nodelist>) and visible = true;SELECT DISTINCTROW id, node_a, node_b,
tags FROM current_segments_temp ORDER BY id ASC'
Using this schema, we can further reduce API load by inserting the
output of all select statements for a given list of nodes into the one
temporary table in the single SQL connection, then performing distinctrow.
The new query takes 28 milliseconds to execute, compared to the existing
9 seconds.
More information about the dev
mailing list