[OSM-dev] Database Schema (was Re: [OSM-talk] Oh dear - planet has duplicate id's)

Sat Jun 23 14:52:37 BST 2007

On 6/23/07, SteveC <steve at asklater.com> wrote:
>
>
> On 23 Jun 2007, at 12:02, Tom Hughes wrote:
>
> In message <233e28ed4e.tom at loxley.compton.nu>
>           Tom Hughes <tom at compton.nu> wrote:
>
> I now have a migration script which produces a schema that more or
> less matches the create_database.sql script in the repository.
>
> The only place where it doesn't match is two columns which have been
> declared with an unsigned qualifier in the SQL script, and I propose
> that we drop the qualifier from those - in at last one case other
> columns that are logically the same type don't have the qualifier
> anyway.
>
> I would therefore like to propose a plan of action:
>
>   1. Somebody dumps the actual schema from the live database so I can
>      make sure it matches create_database.sql and correct anything in
>      my migration script that doesn't quite match the live schema.
>
>   2. Commit my fixed migration script and run an SQL script provided
>      by me to drop those unsigned qualifiers from the live database
>      so that the two are fully in sync.
>
>
> These first two steps are now complete, so the migration script in
> the repository now produces a databases which matches the live one.
>
>   3. I produce a new migration script to migrate the database to a
>      new schema version that fixes the missing unique indexes and
>      any other oddidities in the current schema.
>
>
> I've had a look through the schema and have a series of proposals to
> clean things up, which (on a table by table basis) look like:
>
>   current_nodes
>
>     - Make the index on id a primary key, thus guaranteeing uniqueness.
>
>     - Split the current index on lat+lon into two separate indexes
>       for each axis - as it stands the lon component of the index can
>       only be used when exactly one latitude has been selected (more
>       or less impossible as it is floating point) so is never used. With
>       separate indexes the database will be able to decide which is most
>       efficient for each query and the keys will be smaller so there
>       will be less I/O for whichever index it uses.
>
>   current_segments
>
>     - Make the index on id a primary key, thus guaranteeing uniqueness.
>
>     - Drop the visible component from the id index as it is pointless
>       if the id component is unique anyway. If an index to help find
>       only visible components is needed then visible should come first.
>
>   current_way_segments
>
>     - Make segment_id bigint(64) to match other tables.
>
>     - Make the index on id a primary key by adding sequence_id.
>
>   current_way_tags
>
>     - No changes proposed.
>
>   current_ways
>
>     - No changes proposed.
>
>   friends
>
>     - Add index on user_id so that a user's friends can be found
>       efficiently.
>
>   gps_points
>
>     - Make gpx_id bigint(64) to match other tables.
>
>     - Drop user_id column and index as this information can be
>       found by joining to gpx_files and recording the user_id for
>       every point seems pointless.
>
>     - Split lat/lon index into two separate indexes, dropping the
>       user_id component - reasons for this are the same as for nodes.
>
>   gpx_gile_tags
>
>     - Drop the id field and primary key as they don't seem to serve
>       any purpose.
>
>   gpx_files
>
>     - What is the visible field for here? There are a few records
>       with it set to zero but the code ignores it - if it isn't
>       used then let's remove it.
>
>
> isnt that whats used to delete them?
>

If it is used for deletion purposes, rename the field.  Visible is hardly a
name for showing if the record is deleted or not.

gpx_pending_files
>
>     - This seems to be unused so let's remove it.
>
>   messages
>
>     - Drop user_id column as it serves no purpose (from_user_id
>       and to_user_id record who the message is from and to) and
>       it is breaking the message functionality at the moment.
>
>     - Add index on to_user_id so we can find a user's messages
>       efficiently.
>
>     - Drop from_display_name column and index as although it is
>       filled in it is never used, and the information is available
>       by joining to the user table anyway.
>
>   meta_areas
>
>     - Seems to be unused (but has a few records) and it presumbly
>       part of the old area code so should be dropped.
>
>   nodes
>
>     - Split lat/lon index in to as for current_nodes.
>
>   segments
>
>     - No changes proposed.
>
>   users
>
>     - Make email and display name indexes unique.
>
>   way_segments
>
>     - Make segment_id bigint(64) to match other tables.
>
>   way_tags
>
>     - No changes proposed.
>
> Please let me know what you think folks...
>
> Tom
>
> --
> Tom Hughes (tom at compton.nu)
> http://www.compton.nu/
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>
>
> have fun,
>
> SteveC | steve at asklater.com | http://www.asklater.com/steve/
>
>
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>
>

-- 
Ray Booysen
raybooysen at rjb.za.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20070623/999442d2/attachment.html>