[OSM-dev] Alternative PostgreSQL Schema
brett at bretth.com
Wed Oct 24 15:23:47 BST 2007
Tom Hughes wrote:
>> The current tables don't duplicate data, they just flag which records
>> are currently active.
>> The history tables contain the only full copy of data.
>> Current tables only allow a single version of an entity to be active at
>> one time.
>> The current tables contain a visible column but it has a check
>> constraint forcing it to true. Combined with a three part foreign key
>> of id, version and visible columns back to the history table, this
>> ensures that only visible records can be marked current. When a record
>> is logically deleted, the current table record is deleted.
> What's the logic behind not having deleted objects in the current
> table? Not having them there will certainly make it more complicated
> to decide whether to return 404 or 410 when an object is accessed
> which does not exist in the current table.
2 reasons. 1. I didn't know they were required and assumed the main
reason they existed in MySQL was to ensure ids were never duplicated.
2. To ensure references between data types (eg. ways containing nodes)
only refer to visible records.
Is there really a need for the API to distinguish between a 404 and a
410? Perhaps editors such as JOSM are using this?
I'm guessing the only time this occurs is when a single data element is
specifically requested on the API, so the overhead of running a second
query within the API shouldn't be too high. For example:
My API knowledge is fairly limited so I'm only speculating of course.
If necessary I can probably fix it by adding a visible column to all the
join tables and moving the check constraint to those. I'd prefer to
avoid that if possible though.
>> The MySQL relation_member table is split into three, one per osm type.
>> way_node and the relation_x tables ensure that if a current record is
>> deleted, all referential integrity back to parent data is verified.
> It looks like you're only doing referential integrity against the
> current node tables from the way nodes, but that won't work as an
> old version of a way may refer to a node that is no longer current
> or which has even been deleted.
Ah of course. That's a showstopper :-) Time for a rethink. Good thing
I posted this before getting too carried away ...
>> The tile column isn't shown on the node table. It may be possible to
>> remove the latitude/longitude columns and replace with a point column to
>> take advantage of PostgreSQL geo-spatial indexing support. Some
>> experimentation required here.
> The relative costs of using tiles or R-Tree indexing in Postgres need
> to be compared - both speedwise and sizewise (ie the number of keys
> which can be held in a given amount of memory for caching).
> Note that you don't need to use PostGIS or anything, just a simple R-Tree
> index in basic Postgres.
> I see you're also using double for lat/long rather than fixed point
> integers which would probably halve the size of those values.
Yep, much more work required here. Haven't put too much effort in here yet.
>> * Current tables aren't forced to point to the latest version within the
>> main table. I can't think of a way to do this but I don't think it is a
>> major issue. Bugs resulting in failure to update the current table will
>> soon be picked up due to data not showing up on subsequent queries.
>> Transaction support allows these writes to be all or nothing.
> There's also a question of how to handle version number allocation - we
> currently rely on an obscure feature of MyISAM tables to do it...
That's something else I was curious about. I had assumed that this was
being done in code. In other words using a sequence to allocate the
main id then incrementing in code with any concurrent edits resulting in
a key violation.
>> * Current tables require a visible field which only exists so that only
>> visible data can be added to the current table. Can't think of a better
>> way to do this.
> That is pretty ugly, but I can't see a better way either.
I'm stuck in meetings again most of tomorrow so that should give me some
more time to think this through ;-)
More information about the dev