[OSM-dev] Alternative PostgreSQL Schema

Brett Henderson brett at bretth.com
Wed Oct 24 14:15:45 BST 2007


Hi All,

I've been starting to play with PostgreSQL to learn more about it and to 
see what it is capable of.

I've started working on a schema to support referential integrity of osm 
data.  It solves several issues with the existing MySQL database.  I'd 
appreciate any thoughts or feedback:
1. Referential integrity between all data types including verification 
that referred objects are visible.
2. No duplication of data between history tables and current tables 
other than the primary key and the visible flag.
3. Transaction support.  This comes for free with PostgreSQL.

An ER diagram is available here:
http://www.bretth.com/osm/pgsql_erd1.jpg
I don't have a script yet because it was quicker to draw a diagram than 
to actually create all of the table definitions.

The diagram doesn't show everything so it may require some explanation:
The user table is just there as a placeholder.
The current tables don't duplicate data, they just flag which records 
are currently active.
The history tables contain the only full copy of data.
Current tables only allow a single version of an entity to be active at 
one time.
The current tables contain a visible column but it has a check 
constraint forcing it to true.  Combined with a three part foreign key 
of id, version and visible columns back to the history table, this 
ensures that only visible records can be marked current.  When a record 
is logically deleted, the current table record is deleted.
As per existing MySQL schema, the history tables are never updated or 
deleted, only inserted and selected.
The MySQL relation_member table is split into three, one per osm type.
way_node and the relation_x tables ensure that if a current record is 
deleted, all referential integrity back to parent data is verified.
The tile column isn't shown on the node table.  It may be possible to 
remove the latitude/longitude columns and replace with a point column to 
take advantage of PostgreSQL geo-spatial indexing support.  Some 
experimentation required here.

It isn't perfect, these are some of the issues:
* Lots of tables.  But I think only one more than the existing schema.
* Current tables aren't forced to point to the latest version within the 
main table.  I can't think of a way to do this but I don't think it is a 
major issue.  Bugs resulting in failure to update the current table will 
soon be picked up due to data not showing up on subsequent queries.  
Transaction support allows these writes to be all or nothing.
* Current tables require a visible field which only exists so that only 
visible data can be added to the current table.  Can't think of a better 
way to do this.
* Circular references between relations are possible but I believe that 
is practically impossible to prevent in the data model itself.

This schema supports the existing 0.5 logical data model, just in a 
different physical db representation.  I'm also playing with an 
alternative schema that results in unique ids across all data types and 
reduces the number of tables significantly but that may be too drastic.

Any thoughts appreciated.

Cheers,
Brett





More information about the dev mailing list