[OSM-dev] Alternative PostgreSQL Schema
brett at bretth.com
Wed Oct 24 14:15:45 BST 2007
I've been starting to play with PostgreSQL to learn more about it and to
see what it is capable of.
I've started working on a schema to support referential integrity of osm
data. It solves several issues with the existing MySQL database. I'd
appreciate any thoughts or feedback:
1. Referential integrity between all data types including verification
that referred objects are visible.
2. No duplication of data between history tables and current tables
other than the primary key and the visible flag.
3. Transaction support. This comes for free with PostgreSQL.
An ER diagram is available here:
I don't have a script yet because it was quicker to draw a diagram than
to actually create all of the table definitions.
The diagram doesn't show everything so it may require some explanation:
The user table is just there as a placeholder.
The current tables don't duplicate data, they just flag which records
are currently active.
The history tables contain the only full copy of data.
Current tables only allow a single version of an entity to be active at
The current tables contain a visible column but it has a check
constraint forcing it to true. Combined with a three part foreign key
of id, version and visible columns back to the history table, this
ensures that only visible records can be marked current. When a record
is logically deleted, the current table record is deleted.
As per existing MySQL schema, the history tables are never updated or
deleted, only inserted and selected.
The MySQL relation_member table is split into three, one per osm type.
way_node and the relation_x tables ensure that if a current record is
deleted, all referential integrity back to parent data is verified.
The tile column isn't shown on the node table. It may be possible to
remove the latitude/longitude columns and replace with a point column to
take advantage of PostgreSQL geo-spatial indexing support. Some
experimentation required here.
It isn't perfect, these are some of the issues:
* Lots of tables. But I think only one more than the existing schema.
* Current tables aren't forced to point to the latest version within the
main table. I can't think of a way to do this but I don't think it is a
major issue. Bugs resulting in failure to update the current table will
soon be picked up due to data not showing up on subsequent queries.
Transaction support allows these writes to be all or nothing.
* Current tables require a visible field which only exists so that only
visible data can be added to the current table. Can't think of a better
way to do this.
* Circular references between relations are possible but I believe that
is practically impossible to prevent in the data model itself.
This schema supports the existing 0.5 logical data model, just in a
different physical db representation. I'm also playing with an
alternative schema that results in unique ids across all data types and
reduces the number of tables significantly but that may be too drastic.
Any thoughts appreciated.
More information about the dev