[Tagging] football or soccer ?

Colin Smale colin.smale at xs4all.nl
Mon Aug 2 09:38:44 BST 2010


  On 01/07/2010 15:25, Anthony wrote:
> On Thu, Jul 1, 2010 at 8:08 AM, John F. Eldredge <john at jfeldredge.com 
> <mailto:john at jfeldredge.com>> wrote:
>
>     In fact, the technique of having the user select from a list of
>     words, but actually storing the value as an arbitrary ID
>     (generally numeric), is the recommended technique in database
>     design.  It is called "normalizing the database".
>
>
> Umm...no.  At least, not exactly.  If a single column is independent 
> from other columns, it is not necessary for normalization to store it 
> as an arbitrary ID.  (For example, if you have a database table 
> containing a driver's license number, date of birth, and hair color, 
> you generally wouldn't store the hair color as an arbitrary ID and 
> then have a separate table to look up the hair color.  It certainly 
> isn't necessary for normalization.  Assuming driver's license number 
> is your primary key, hair color is a fact about the key, the whole 
> key, and nothing but the key.)

Actually that would be exactly what you would do, assuming you want the 
list of colours to be controlled and finite. If you denormalise and put 
the text of the hair colour in the person table you are enabling 
spelling variations, translations and other kinds of "noise" which is 
usually what you want to prevent. A real-life example would be the 
colour of a car in the registration database. My car is painted 
(according to the manufacturer) "Noir Nacre" but I wouldn't find "Noir 
Nacre" in the government database. It's black, however you look at it. 
Unless you are French, in which case it is "noir". Etc etc.

It starts with a question about the data model. Do you recognise colour 
as having a finite set of valid values, or is it really free text? The 
"OSM way" is to have everything as "free text" at a technical level, and 
to maintain any "list of valid values" by general consensus, although 
even this goes against the grain for some people. As the profile of OSM 
improves in the market for cartographic data, it will become 
increasingly important to demonstrate that the data has some kind of 
quality control.

The discussion about football vs. soccer is not one about what it IS, 
but what it is CALLED. British English is the base language for OSM, so 
the main tag value should be "sport=football". Just as German-speakers 
are free (encouraged?) to use "sport:de=fussball" why should it not be 
"sport:us=soccer" in the USA?

> If you're using a crappy "DBMS" you might do this anyway, not for 
> normalization, but for performance purposes, because the DBMS is too 
> stupid to do it automatically behind the scenes for you.  If you're 
> using a good DBMS, it won't be necessary, though.

Which DBMS do you call crappy and which do you call good?

Colin


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tagging/attachments/20100802/17e5da1b/attachment-0001.html>


More information about the Tagging mailing list