[OSM-dev] Full History Tables
Brett Henderson
brett at bretth.com
Mon Aug 6 13:31:18 BST 2007
Hi All,
Is there any way of getting access to a full copy of the production
database?
I'm writing queries to dump the database in a similar fashion to
planet.rb but in a way that produces a consistent snapshot of the
database at a point in time. This is essential to beginning a
replication process where all subsequent reads will extract a diff
between two points in time. I'm using planet files as a source of data
but presumably the production history tables contain a far larger data
set than is contained in the planet. What works on my test database may
not work on a database with full history. I'm also concerned about
"funky" data in the production database such as multiple rows with
identical timestamps, etc.
I specifically need the history tables. If privacy is an issue, the
user_id column may be set to null as I don't read it anyway. Running
this on a live database at this point is probably not advisable because
the history tables are MyISAM and will presumably prevent any updates
from occurring while they're running.
Osmosis appears to be working properly now and can replicate changes
between two mysql databases. It can do this via xml files as an
intermediate step or directly if required. The remaining tasks are the
hard ones, tuning database queries. I'm no expert at the black art of
query tuning but I'm slowly making progress. If anybody has experience
in these matters, has some free time, and is interested, I'd love some
help :-)
Cheers,
Brett
PS. These are some example queries showing what I'm trying to do. Note
that the ORDER BY clauses are in the wrong place and will probably bite
me with a true unordered table ... unfortunately moving it to the outer
query really kills performance.
**** NODES ****
With the following index created ...
ALTER TABLE `osm`.`nodes` ADD INDEX `nodes_idtstamp_idx`(`id`, `timestamp`);
The following query takes less than 30s to begin returning data. I've
been unable to do any better than this.
SELECT n.id, n.timestamp, n.latitude, n.longitude, n.tags
FROM nodes n
INNER JOIN
(
SELECT id, MAX(timestamp) AS timestamp
FROM nodes
WHERE timestamp < '2008-01-01 00:00:00'
GROUP BY id
ORDER BY id
) n2 ON n.id = n2.id AND n.timestamp = n2.timestamp
WHERE visible = 1
**** WAY TAGS ****
The following query takes less than 30s to begin returning data. I
haven't tried too hard yet, but I haven't had much luck improving this
either.
SELECT wt.id as way_id, wt.k, wt.v
FROM way_tags wt
INNER JOIN
(
SELECT id, MAX(version) AS version
FROM ways
WHERE timestamp < '2008-01-01 00:00:00'
GROUP BY id
ORDER BY id
) w ON wt.id = w.id AND wt.version = w.version
More information about the dev
mailing list