[OSM-dev] Database Schema (was Re: [OSM-talk] Oh dear - planet has duplicate id's)

Sat Jun 23 12:02:07 BST 2007

In message <233e28ed4e.tom at loxley.compton.nu>
          Tom Hughes <tom at compton.nu> wrote:

> I now have a migration script which produces a schema that more or
> less matches the create_database.sql script in the repository.
> 
> The only place where it doesn't match is two columns which have been
> declared with an unsigned qualifier in the SQL script, and I propose
> that we drop the qualifier from those - in at last one case other
> columns that are logically the same type don't have the qualifier
> anyway.
> 
> I would therefore like to propose a plan of action:
> 
>   1. Somebody dumps the actual schema from the live database so I can
>      make sure it matches create_database.sql and correct anything in
>      my migration script that doesn't quite match the live schema.
> 
>   2. Commit my fixed migration script and run an SQL script provided
>      by me to drop those unsigned qualifiers from the live database
>      so that the two are fully in sync.

These first two steps are now complete, so the migration script in
the repository now produces a databases which matches the live one.

>   3. I produce a new migration script to migrate the database to a
>      new schema version that fixes the missing unique indexes and
>      any other oddidities in the current schema.

I've had a look through the schema and have a series of proposals to
clean things up, which (on a table by table basis) look like:

  current_nodes

    - Make the index on id a primary key, thus guaranteeing uniqueness.

    - Split the current index on lat+lon into two separate indexes
      for each axis - as it stands the lon component of the index can
      only be used when exactly one latitude has been selected (more
      or less impossible as it is floating point) so is never used. With
      separate indexes the database will be able to decide which is most
      efficient for each query and the keys will be smaller so there
      will be less I/O for whichever index it uses.

  current_segments

    - Make the index on id a primary key, thus guaranteeing uniqueness.

    - Drop the visible component from the id index as it is pointless
      if the id component is unique anyway. If an index to help find
      only visible components is needed then visible should come first.

  current_way_segments

    - Make segment_id bigint(64) to match other tables.

    - Make the index on id a primary key by adding sequence_id.

  current_way_tags

    - No changes proposed.

  current_ways

    - No changes proposed.

  friends

    - Add index on user_id so that a user's friends can be found
      efficiently.

  gps_points

    - Make gpx_id bigint(64) to match other tables.

    - Drop user_id column and index as this information can be
      found by joining to gpx_files and recording the user_id for
      every point seems pointless.

    - Split lat/lon index into two separate indexes, dropping the
      user_id component - reasons for this are the same as for nodes.

  gpx_gile_tags

    - Drop the id field and primary key as they don't seem to serve
      any purpose.

  gpx_files

    - What is the visible field for here? There are a few records
      with it set to zero but the code ignores it - if it isn't
      used then let's remove it.

  gpx_pending_files

    - This seems to be unused so let's remove it.

  messages

    - Drop user_id column as it serves no purpose (from_user_id
      and to_user_id record who the message is from and to) and
      it is breaking the message functionality at the moment.

    - Add index on to_user_id so we can find a user's messages
      efficiently.

    - Drop from_display_name column and index as although it is
      filled in it is never used, and the information is available
      by joining to the user table anyway.

  meta_areas

    - Seems to be unused (but has a few records) and it presumbly
      part of the old area code so should be dropped.

  nodes

    - Split lat/lon index in to as for current_nodes.

  segments

    - No changes proposed.

  users

    - Make email and display name indexes unique.

  way_segments

    - Make segment_id bigint(64) to match other tables.

  way_tags

    - No changes proposed.

Please let me know what you think folks...

Tom

-- 
Tom Hughes (tom at compton.nu)
http://www.compton.nu/