[OSM-talk] Explicit tagging of name language

Ed Avis eda at waniasset.com
Mon Dec 6 10:00:57 GMT 2010


This message explains a problem that user interfaces such as Nominatim have when
choosing the correct localized 'name' tag to show to the user, why I believe this
is caused by incomplete tagging, and a proposal to fix it.

I was surprised when searching on the OSM home page to find Edinburgh was not
in Scotland but in 'Ecosse'.  My preferred language is English, although I do
understand a few words of French, and I had set my browser language preferences
accordingly.  The browser sends an HTTP Accept-Language header giving English a
better score than French.  So why does Nominatim think I would prefer to see the
French name for the country?

It is because it sees some tags like (simplified to illustrate):

    name=Scotland
    name:fr=Ecosse
    name:es=Escocia

Given that, and the user's preferred languages [en, fr], what name should be
picked?  The program cannot know that the name 'Scotland' is in English, so the
best course of action is to pick a name that the browser says it will accept.
If none of the names is tagged with an accepted language then it can fall back
to the ordinary 'name' tag as a last resort, but if some localized names are
there then they should be used.  The alternative would be no localization.

I have noticed similar problems when searching in the USA: someone added Serbian
Cyrillic names for the 50 states, which now pop up instead of the English names
because I have included Serbian in my language list, even though with tiny score.

I believe the answer, as so often, is to improve the tagging used so that
software has the information it needs.  In this case an explicit English-
language name should be added, so we have

    name=Scotland
    name:en=Scotland
    name:fr=Ecosse
    name:es=Escocia

(Another way to tag the same info would be to invent a new tag
'language_of_main_name=en' but this seems cumbersome and would not be understood
by existing software.)

In an attempt to fix this I have asked the maintainer of
<http://keepright.ipax.at/> to add a data check.  Where a choice of languages
exists for a name, then there should be one that corresponds to the main
'name' tag.  In other words for the example above there was name=Scotland but
not any name:XX=Scotland.  One should be added indicating the language of this
name, so that user interfaces can choose among the name:XX.  Of course if an
object has just a single name tag to be used for all languages, that's fine.

What I plan to do is to work through these 'language unknown' warnings and,
with help from a tool, add explicit language tags.  I have manually fixed the
small number of cases in London but it gets more interesting in Wales (where
a user who understands both English and Welsh, but prefers English, will
currently be given the Welsh names) or Turkey (where a user preferring Turkish
to Greek will be given Greek names for many places).

In the new year I plan to write a small tool to help fix these, prompting a
human being to decide or at least verify the language of each name.  Then an
additional name:XX tag will be added to the object.  Sound sensible?

-- 
Ed Avis <eda at waniasset.com>




More information about the talk mailing list