<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

    <title></title>

  </head>

  <body bgcolor="#ffffff" text="#000000">

    On 01/07/2010 15:25, Anthony wrote:

    <blockquote

      cite="mid:AANLkTinhPH6Gp_O_lPybItLS_3YhEg6iszc4WIeMtUdQ@mail.gmail.com"

      type="cite">On Thu, Jul 1, 2010 at 8:08 AM, John F. Eldredge <span

        dir="ltr"><<a moz-do-not-send="true"

          href="mailto:john@jfeldredge.com">john@jfeldredge.com</a>></span>

      wrote:<br>

      <div class="gmail_quote">

        <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt

          0.8ex; border-left: 1px solid rgb(204, 204, 204);

          padding-left: 1ex;">

          <div class="im">In fact, the technique of having the user

            select from a list of words, but actually storing the value

            as an arbitrary ID (generally numeric), is the recommended

            technique in database design.  It is called "normalizing the

            database".</div>

        </blockquote>

        <div><br>

        </div>

        <div>Umm...no.  At least, not exactly.  If a single column is

          independent from other columns, it is not necessary for

          normalization to store it as an arbitrary ID.  (For example,

          if you have a database table containing a driver's license

          number, date of birth, and hair color, you generally wouldn't

          store the hair color as an arbitrary ID and then have a

          separate table to look up the hair color.  It certainly isn't

          necessary for normalization.  Assuming driver's license number

          is your primary key, hair color is a fact about the key, the

          whole key, and nothing but the key.)</div>

      </div>

    </blockquote>

    <br>

    Actually that would be exactly what you would do, assuming you want

    the list of colours to be controlled and finite. If you denormalise

    and put the text of the hair colour in the person table you are

    enabling spelling variations, translations and other kinds of

    "noise" which is usually what you want to prevent. A real-life

    example would be the colour of a car in the registration database.

    My car is painted (according to the manufacturer) "Noir Nacre" but I

    wouldn't find "Noir Nacre" in the government database. It's black,

    however you look at it. Unless you are French, in which case it is

    "noir". Etc etc.<br>

    <br>

    It starts with a question about the data model. Do you recognise

    colour as having a finite set of valid values, or is it really free

    text? The "OSM way" is to have everything as "free text" at a

    technical level, and to maintain any "list of valid values" by

    general consensus, although even this goes against the grain for

    some people. As the profile of OSM improves in the market for

    cartographic data, it will become increasingly important to

    demonstrate that the data has some kind of quality control.<br>

    <br>

    The discussion about football vs. soccer is not one about what it

    IS, but what it is CALLED. British English is the base language for

    OSM, so the main tag value should be "sport=football". Just as

    German-speakers are free (encouraged?) to use "sport:de=fussball"

    why should it not be "sport:us=soccer" in the USA?<br>

    <br>

    <blockquote

      cite="mid:AANLkTinhPH6Gp_O_lPybItLS_3YhEg6iszc4WIeMtUdQ@mail.gmail.com"

      type="cite">

      <div class="gmail_quote">

        <div>If you're using a crappy "DBMS" you might do this anyway,

          not for normalization, but for performance purposes, because

          the DBMS is too stupid to do it automatically behind the

          scenes for you.  If you're using a good DBMS, it won't be

          necessary, though.</div>

      </div>

    </blockquote>

    <br>

    Which DBMS do you call crappy and which do you call good?<br>

    <br>

    Colin<br>

    <br>

    <br>

  </body>

</html>