[OSM-dev] Alternative PostgreSQL Schema

Tom Hughes tom at compton.nu
Wed Oct 24 15:03:20 BST 2007


In message <471F4581.4050908 at bretth.com>
        Brett Henderson <brett at bretth.com> wrote:

> I've been starting to play with PostgreSQL to learn more about it and to 
> see what it is capable of.
>
> I've started working on a schema to support referential integrity of osm 
> data.  It solves several issues with the existing MySQL database.  I'd 
> appreciate any thoughts or feedback:
> 1. Referential integrity between all data types including verification 
> that referred objects are visible.
> 2. No duplication of data between history tables and current tables 
> other than the primary key and the visible flag.
> 3. Transaction support.  This comes for free with PostgreSQL.
>
> An ER diagram is available here:
> http://www.bretth.com/osm/pgsql_erd1.jpg
> I don't have a script yet because it was quicker to draw a diagram than 
> to actually create all of the table definitions.

Look interesting...

> The current tables don't duplicate data, they just flag which records 
> are currently active.
> The history tables contain the only full copy of data.
> Current tables only allow a single version of an entity to be active at 
> one time.
> The current tables contain a visible column but it has a check 
> constraint forcing it to true.  Combined with a three part foreign key 
> of id, version and visible columns back to the history table, this 
> ensures that only visible records can be marked current.  When a record 
> is logically deleted, the current table record is deleted.

What's the logic behind not having deleted objects in the current
table? Not having them there will certainly make it more complicated
to decide whether to return 404 or 410 when an object is accessed
which does not exist in the current table.

> The MySQL relation_member table is split into three, one per osm type.
> way_node and the relation_x tables ensure that if a current record is 
> deleted, all referential integrity back to parent data is verified.

It looks like you're only doing referential integrity against the
current node tables from the way nodes, but that won't work as an
old version of a way may refer to a node that is no longer current
or which has even been deleted.

> The tile column isn't shown on the node table.  It may be possible to 
> remove the latitude/longitude columns and replace with a point column to 
> take advantage of PostgreSQL geo-spatial indexing support.  Some 
> experimentation required here.

The relative costs of using tiles or R-Tree indexing in Postgres need
to be compared - both speedwise and sizewise (ie the number of keys
which can be held in a given amount of memory for caching).

Note that you don't need to use PostGIS or anything, just a simple R-Tree
index in basic Postgres.

I see you're also using double for lat/long rather than fixed point
integers which would probably halve the size of those values.

> * Current tables aren't forced to point to the latest version within the 
> main table.  I can't think of a way to do this but I don't think it is a 
> major issue.  Bugs resulting in failure to update the current table will 
> soon be picked up due to data not showing up on subsequent queries.  
> Transaction support allows these writes to be all or nothing.

There's also a question of how to handle version number allocation - we
currently rely on an obscure feature of MyISAM tables to do it...

> * Current tables require a visible field which only exists so that only 
> visible data can be added to the current table.  Can't think of a better 
> way to do this.

That is pretty ugly, but I can't see a better way either.

Tom

-- 
Tom Hughes (tom at compton.nu)
http://www.compton.nu/




More information about the dev mailing list