[OSM-dev] Database Schema (was Re: [OSM-talk] Oh dear - planet has duplicate id's)

Sat Jun 23 14:53:32 BST 2007

Sorry if I'm jumping in late, but I'm finding it a bit strange that on the
most important tables, we're missing primary keys.  Seems almost the first
thing you do when designing tables is chosing you primary key and setting
the index.

On 6/23/07, Tom Hughes <tom at compton.nu> wrote:
>
> In message <233e28ed4e.tom at loxley.compton.nu>
>           Tom Hughes <tom at compton.nu> wrote:
>
> > I now have a migration script which produces a schema that more or
> > less matches the create_database.sql script in the repository.
> >
> > The only place where it doesn't match is two columns which have been
> > declared with an unsigned qualifier in the SQL script, and I propose
> > that we drop the qualifier from those - in at last one case other
> > columns that are logically the same type don't have the qualifier
> > anyway.
> >
> > I would therefore like to propose a plan of action:
> >
> >   1. Somebody dumps the actual schema from the live database so I can
> >      make sure it matches create_database.sql and correct anything in
> >      my migration script that doesn't quite match the live schema.
> >
> >   2. Commit my fixed migration script and run an SQL script provided
> >      by me to drop those unsigned qualifiers from the live database
> >      so that the two are fully in sync.
>
> These first two steps are now complete, so the migration script in
> the repository now produces a databases which matches the live one.
>
> >   3. I produce a new migration script to migrate the database to a
> >      new schema version that fixes the missing unique indexes and
> >      any other oddidities in the current schema.
>
> I've had a look through the schema and have a series of proposals to
> clean things up, which (on a table by table basis) look like:
>
>   current_nodes
>
>     - Make the index on id a primary key, thus guaranteeing uniqueness.
>
>     - Split the current index on lat+lon into two separate indexes
>       for each axis - as it stands the lon component of the index can
>       only be used when exactly one latitude has been selected (more
>       or less impossible as it is floating point) so is never used. With
>       separate indexes the database will be able to decide which is most
>       efficient for each query and the keys will be smaller so there
>       will be less I/O for whichever index it uses.
>
>   current_segments
>
>     - Make the index on id a primary key, thus guaranteeing uniqueness.
>
>     - Drop the visible component from the id index as it is pointless
>       if the id component is unique anyway. If an index to help find
>       only visible components is needed then visible should come first.
>
>   current_way_segments
>
>     - Make segment_id bigint(64) to match other tables.
>
>     - Make the index on id a primary key by adding sequence_id.
>
>   current_way_tags
>
>     - No changes proposed.
>
>   current_ways
>
>     - No changes proposed.
>
>   friends
>
>     - Add index on user_id so that a user's friends can be found
>       efficiently.
>
>   gps_points
>
>     - Make gpx_id bigint(64) to match other tables.
>
>     - Drop user_id column and index as this information can be
>       found by joining to gpx_files and recording the user_id for
>       every point seems pointless.
>
>     - Split lat/lon index into two separate indexes, dropping the
>       user_id component - reasons for this are the same as for nodes.
>
>   gpx_gile_tags
>
>     - Drop the id field and primary key as they don't seem to serve
>       any purpose.
>
>   gpx_files
>
>     - What is the visible field for here? There are a few records
>       with it set to zero but the code ignores it - if it isn't
>       used then let's remove it.
>
>   gpx_pending_files
>
>     - This seems to be unused so let's remove it.
>
>   messages
>
>     - Drop user_id column as it serves no purpose (from_user_id
>       and to_user_id record who the message is from and to) and
>       it is breaking the message functionality at the moment.
>
>     - Add index on to_user_id so we can find a user's messages
>       efficiently.
>
>     - Drop from_display_name column and index as although it is
>       filled in it is never used, and the information is available
>       by joining to the user table anyway.
>
>   meta_areas
>
>     - Seems to be unused (but has a few records) and it presumbly
>       part of the old area code so should be dropped.
>
>   nodes
>
>     - Split lat/lon index in to as for current_nodes.
>
>   segments
>
>     - No changes proposed.
>
>   users
>
>     - Make email and display name indexes unique.
>
>   way_segments
>
>     - Make segment_id bigint(64) to match other tables.
>
>   way_tags
>
>     - No changes proposed.
>
> Please let me know what you think folks...
>
> Tom
>
> --
> Tom Hughes (tom at compton.nu)
> http://www.compton.nu/
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>

-- 
Ray Booysen
raybooysen at rjb.za.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20070623/0dd3d290/attachment.html>