[Tagging] What language is the name tag value? Was: Which languages are admissible for name:xx tags?
tod at fitchfamily.org
Thu Mar 26 22:53:41 UTC 2020
I was blissfully unaware of the issues involved in creating a bilingual map (dual labeled in the local language and in my language) until I actually tried to create one.
Most objects in OSM do not have name:<lg> tags for non-local languages and it is unreasonable to expect that every named village, town, river and stream in some far off land have a name:<lg> value for your language. Especially since the vast majority of the values would be transliterations which we would like to avoid .
In my case I thought it would be useful to have a printed map with features showing the local name in the local language and script along with an English version of the name. If the name:en existed use it. If not then use an automatically generated phonetic transliteration. The problem I ran into was you need to know the language used in the name tag  to know what rules to use for transliteration.
As I understand it, there are several ways to determine the language and there has been at least a couple of proposals or tagging list discussions on this topic but with no consensus. The ones that come to mind for me are:
1. The current situation where the there is no formal method. If one follows the wiki recommendation  that the name value be duplicated into a name:<lg> tag then the language can be determined in many cases. Unfortunately this fall apart if the same spelling and alphabet are used with different pronunciations. An example that comes to mind would be for the city of Paris where there are several name:<lg> tags that exactly match the name value. Which one should be picked? The pronunciations are different in different languages, if you are going to attempt automatic transliteration you would like to correctly pick French as the language to transliterate from. So the current situation is not ideal: This wiki recommendation is seldom followed. If it is then some mappers are going to decide that the “duplicates” are undesirable and remove them. And, if followed there are fairly common cases where it still is not possible to determine the language.
2. Create a scheme where a default language can be set on boundaries as has been suggested by Joseph Eisenberg . This has the advantage that relatively few objects need to be tagged, for example it might be possible that only one tag could be used to cover the continental United States. But it falls apart for features that are on the boundary between multiple language areas (Mediterranean Sea for example) and for areas that are multilingual (Wales for example). In addition, it seems that any type of “we should add a default for an administrative area” proposal that has come up here has been rejected or “bike shedded” into oblivion.
3. Drop the name tag altogether and add a name formatting tag . As I understand it, the formatting specification would be used to take one or more name:<lg> values and specify how they should be combined to create a name for display purposes. Migrating to this scheme could be hard and may only be possible if a default name format could be specified for an area as in the default language scheme in 2 above. Even if a default mechanism could be agreed to, the magnitude of changing all the name=* to name:<lg>=* for even a relative small monolingual area would be a big task if automated changes are not used. Finally, it does not resolve the issue of ambiguity in the language used.
4. Create a new tag explicitly specifying the language used in the name tag. This has the same disadvantage as the current wiki recommendation for duplicating the name value into a name:<lg> value in that it has to be done on each object. But it does have the advantage that it there is no ambiguity in showing the language of the name tag value and it can be rolled out a little at a time.
5. ??? There are probably other schemes that have been proposed that I haven’t noticed.
> On Mar 25, 2020, at 11:25 PM, Joseph Eisenberg <joseph.eisenberg at gmail.com> wrote:
> That's why I previously proposed a tag like default_language=* which
> could be added to features and boundaries. See
> Unfortunately that was not approved. It's a confusing topic, many of
> the people who opposed the proposal seemed to think it would do
> something else.
> -- Joseph Eisenberg
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 833 bytes
Desc: Message signed with OpenPGP
More information about the Tagging