[OSM-dev] Full History Tables

Brett Henderson brett at bretth.com
Mon Aug 6 13:31:18 BST 2007


Hi All,

Is there any way of getting access to a full copy of the production 
database?

I'm writing queries to dump the database in a similar fashion to 
planet.rb but in a way that produces a consistent snapshot of the 
database at a point in time.  This is essential to beginning a 
replication process where all subsequent reads will extract a diff 
between two points in time.  I'm using planet files as a source of data 
but presumably the production history tables contain a far larger data 
set than is contained in the planet.  What works on my test database may 
not work on a database with full history.  I'm also concerned about 
"funky" data in the production database such as multiple rows with 
identical timestamps, etc.

I specifically need the history tables.  If privacy is an issue, the 
user_id column may be set to null as I don't read it anyway.  Running 
this on a live database at this point is probably not advisable because 
the history tables are MyISAM and will presumably prevent any updates 
from occurring while they're running.

Osmosis appears to be working properly now and can replicate changes 
between two mysql databases.  It can do this via xml files as an 
intermediate step or directly if required.  The remaining tasks are the 
hard ones, tuning database queries.  I'm no expert at the black art of 
query tuning but I'm slowly making progress.  If anybody has experience 
in these matters, has some free time, and is interested, I'd love some 
help :-)

Cheers,
Brett




PS.  These are some example queries showing what I'm trying to do.  Note 
that the ORDER BY clauses are in the wrong place and will probably bite 
me with a true unordered table ... unfortunately moving it to the outer 
query really kills performance.

**** NODES ****
With the following index created ...

ALTER TABLE `osm`.`nodes` ADD INDEX `nodes_idtstamp_idx`(`id`, `timestamp`);

The following query takes less than 30s to begin returning data.  I've 
been unable to do any better than this.

SELECT n.id, n.timestamp, n.latitude, n.longitude, n.tags
FROM nodes n
INNER JOIN
(
SELECT id, MAX(timestamp) AS timestamp
FROM nodes
WHERE timestamp < '2008-01-01 00:00:00'
GROUP BY id
ORDER BY id
) n2 ON n.id = n2.id AND n.timestamp = n2.timestamp
WHERE visible = 1


**** WAY TAGS ****
The following query takes less than 30s to begin returning data.  I 
haven't tried too hard yet, but I haven't had much luck improving this 
either.

SELECT wt.id as way_id, wt.k, wt.v
FROM way_tags wt
INNER JOIN
(
SELECT id, MAX(version) AS version
FROM ways
WHERE timestamp < '2008-01-01 00:00:00'
GROUP BY id
ORDER BY id
) w ON wt.id = w.id AND wt.version = w.version






More information about the dev mailing list