[OSM-dev] Database Schema (was Re: [OSM-talk] Oh dear - planet has duplicate id's)

Sat Jun 23 13:57:00 BST 2007

On 23 Jun 2007, at 12:02, Tom Hughes wrote:

> In message <233e28ed4e.tom at loxley.compton.nu>
>           Tom Hughes <tom at compton.nu> wrote:
>
>> I now have a migration script which produces a schema that more or
>> less matches the create_database.sql script in the repository.
>>
>> The only place where it doesn't match is two columns which have been
>> declared with an unsigned qualifier in the SQL script, and I propose
>> that we drop the qualifier from those - in at last one case other
>> columns that are logically the same type don't have the qualifier
>> anyway.
>>
>> I would therefore like to propose a plan of action:
>>
>>   1. Somebody dumps the actual schema from the live database so I can
>>      make sure it matches create_database.sql and correct anything in
>>      my migration script that doesn't quite match the live schema.
>>
>>   2. Commit my fixed migration script and run an SQL script provided
>>      by me to drop those unsigned qualifiers from the live database
>>      so that the two are fully in sync.
>
> These first two steps are now complete, so the migration script in
> the repository now produces a databases which matches the live one.
>
>>   3. I produce a new migration script to migrate the database to a
>>      new schema version that fixes the missing unique indexes and
>>      any other oddidities in the current schema.
>
> I've had a look through the schema and have a series of proposals to
> clean things up, which (on a table by table basis) look like:
>
>   current_nodes
>
>     - Make the index on id a primary key, thus guaranteeing  
> uniqueness.
>
>     - Split the current index on lat+lon into two separate indexes
>       for each axis - as it stands the lon component of the index can
>       only be used when exactly one latitude has been selected (more
>       or less impossible as it is floating point) so is never used.  
> With
>       separate indexes the database will be able to decide which is  
> most
>       efficient for each query and the keys will be smaller so there
>       will be less I/O for whichever index it uses.
>
>   current_segments
>
>     - Make the index on id a primary key, thus guaranteeing  
> uniqueness.
>
>     - Drop the visible component from the id index as it is pointless
>       if the id component is unique anyway. If an index to help find
>       only visible components is needed then visible should come  
> first.
>
>   current_way_segments
>
>     - Make segment_id bigint(64) to match other tables.
>
>     - Make the index on id a primary key by adding sequence_id.
>
>   current_way_tags
>
>     - No changes proposed.
>
>   current_ways
>
>     - No changes proposed.
>
>   friends
>
>     - Add index on user_id so that a user's friends can be found
>       efficiently.
>
>   gps_points
>
>     - Make gpx_id bigint(64) to match other tables.
>
>     - Drop user_id column and index as this information can be
>       found by joining to gpx_files and recording the user_id for
>       every point seems pointless.
>
>     - Split lat/lon index into two separate indexes, dropping the
>       user_id component - reasons for this are the same as for nodes.
>
>   gpx_gile_tags
>
>     - Drop the id field and primary key as they don't seem to serve
>       any purpose.
>
>   gpx_files
>
>     - What is the visible field for here? There are a few records
>       with it set to zero but the code ignores it - if it isn't
>       used then let's remove it.

isnt that whats used to delete them?

>
>   gpx_pending_files
>
>     - This seems to be unused so let's remove it.
>
>   messages
>
>     - Drop user_id column as it serves no purpose (from_user_id
>       and to_user_id record who the message is from and to) and
>       it is breaking the message functionality at the moment.
>
>     - Add index on to_user_id so we can find a user's messages
>       efficiently.
>
>     - Drop from_display_name column and index as although it is
>       filled in it is never used, and the information is available
>       by joining to the user table anyway.
>
>   meta_areas
>
>     - Seems to be unused (but has a few records) and it presumbly
>       part of the old area code so should be dropped.
>
>   nodes
>
>     - Split lat/lon index in to as for current_nodes.
>
>   segments
>
>     - No changes proposed.
>
>   users
>
>     - Make email and display name indexes unique.
>
>   way_segments
>
>     - Make segment_id bigint(64) to match other tables.
>
>   way_tags
>
>     - No changes proposed.
>
> Please let me know what you think folks...
>
> Tom
>
> -- 
> Tom Hughes (tom at compton.nu)
> http://www.compton.nu/
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>

have fun,

SteveC | steve at asklater.com | http://www.asklater.com/steve/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20070623/f4b00d7c/attachment.html>