[OSM-dev] Database Schema (was Re: [OSM-talk] Oh dear - planet has duplicate id's)

Sat Jun 23 23:09:22 BST 2007

Hi,

> I've had a look through the schema and have a series of proposals to
> clean things up, which (on a table by table basis) look like:

[...]

It's great to see something happen here, I belive a ton lot of 
performance can be squeezed out of mysql if it is done right.

I know you're just tying up loose ends and cleaning up, and you're 
planning enhancements for later, but I cannot resist the temptation to 
voice, once again, my two major enhancement wishes:

(1)

Move the individual "versioning" tags (whodunnit + when) per object to a 
"change group" table. This table contains change group id, user id, 
timestamp, client software used, and optional comment. All objects then 
just refer to the change group id by which they were created (saving a 
lot of space). Enhance the API so that all CRUD requests may include an 
optional change group id, and add a "create change group" call that 
returns a new change group id. (Possibly also "close change group", 
dunno.) Initially, using changegroups could be optional (they could be 
created on-the-fly if an uploaded object didn't have one), but as 
clients take up the idea, they should soon become required.

This will make it so much easier to spot problems, display good version 
histories (including user's comments which could be sent when creating 
the changegroup - like a SVN commit message), and also so selective 
rollbacks if things go awry.

(2)

Devise a way how CRUD messages can be passed to other servers in 
real-time (or near realtime), based on a "subscription list" that may 
(at a later stage perhaps) also be filtered. Set up one other server 
initially that receives all these messages and uses them to keep an 
identical copy of the "current" tables. Start serving all "read" 
requests from that database instance, and also select a small number of 
off-site mirror servers to receive a "full feed" (they, in turn, may 
sub-distribute data of course).

This will take a lot of strain from the central database which becomes 
the write-only master; distribution of current data through an 
arbitrary-size "trickle-down" tree will make data available where it is 
needed, creating really cool opportunities like live mapnik rendering - 
and best of all: *Everybody* who wants it can set up their own live 
mapnik rendering engine since you'll always find a node in the tree to 
attach to.

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00.09' E008°23.33'