[OSM-talk] Explicit tagging of name language
Ed Avis
eda at waniasset.com
Mon Dec 6 10:00:57 GMT 2010
This message explains a problem that user interfaces such as Nominatim have when
choosing the correct localized 'name' tag to show to the user, why I believe this
is caused by incomplete tagging, and a proposal to fix it.
I was surprised when searching on the OSM home page to find Edinburgh was not
in Scotland but in 'Ecosse'. My preferred language is English, although I do
understand a few words of French, and I had set my browser language preferences
accordingly. The browser sends an HTTP Accept-Language header giving English a
better score than French. So why does Nominatim think I would prefer to see the
French name for the country?
It is because it sees some tags like (simplified to illustrate):
name=Scotland
name:fr=Ecosse
name:es=Escocia
Given that, and the user's preferred languages [en, fr], what name should be
picked? The program cannot know that the name 'Scotland' is in English, so the
best course of action is to pick a name that the browser says it will accept.
If none of the names is tagged with an accepted language then it can fall back
to the ordinary 'name' tag as a last resort, but if some localized names are
there then they should be used. The alternative would be no localization.
I have noticed similar problems when searching in the USA: someone added Serbian
Cyrillic names for the 50 states, which now pop up instead of the English names
because I have included Serbian in my language list, even though with tiny score.
I believe the answer, as so often, is to improve the tagging used so that
software has the information it needs. In this case an explicit English-
language name should be added, so we have
name=Scotland
name:en=Scotland
name:fr=Ecosse
name:es=Escocia
(Another way to tag the same info would be to invent a new tag
'language_of_main_name=en' but this seems cumbersome and would not be understood
by existing software.)
In an attempt to fix this I have asked the maintainer of
<http://keepright.ipax.at/> to add a data check. Where a choice of languages
exists for a name, then there should be one that corresponds to the main
'name' tag. In other words for the example above there was name=Scotland but
not any name:XX=Scotland. One should be added indicating the language of this
name, so that user interfaces can choose among the name:XX. Of course if an
object has just a single name tag to be used for all languages, that's fine.
What I plan to do is to work through these 'language unknown' warnings and,
with help from a tool, add explicit language tags. I have manually fixed the
small number of cases in London but it gets more interesting in Wales (where
a user who understands both English and Welsh, but prefers English, will
currently be given the Welsh names) or Turkey (where a user preferring Turkish
to Greek will be given Greek names for many places).
In the new year I plan to write a small tool to help fix these, prompting a
human being to decide or at least verify the language of each name. Then an
additional name:XX tag will be added to the object. Sound sensible?
--
Ed Avis <eda at waniasset.com>
More information about the talk
mailing list