[Tagging] Mapping language borders, tagging offical languages?

Fri Sep 14 02:43:55 UTC 2018

Currently the primary language of a place can be guessed by looking at the
name=* tags and comparing to name:<language code>= or loc_name, if you can
read the local characters and know the language.

For example, by looking at the map in Pakistan, I can tell that they use
Arabic characters to name places and geographical, but I would have to
learn to read these symbols first to know if the language is Arabic or Urdu
or whatever.

It would be useful to tag the primary language of wider communication in a
place, because this information is already implicit in the names of places
but hard to access.

Christoph (@Imagico) has suggested tagging the official language
information on administrative boundary relations:
http://blog.imagico.de/you-name-it-on-representing-geographic-diversity-in-names/

*"In case of Germany the admin_level 2 boundary relation (51477
<http://www.openstreetmap.org/relation/51477>) would get something
like language_format=$de – and there would be no need for further format
strings locally except maybe for a few smaller areas with a local language
or individual features with only a foreign language name."*

Rather than "language_format" I would suggest "official_language=de" or
"language:official=de" for administrative boundaries. Canada, for example,
would have "language:official=en;fr" on the admin_level=2 boundary for the
whole country, I believe.

Because not all places have officially designated languages, and places can
have multiple official languages, we can also use "language:primary=**" for
the most common language of trade, education, business and communication.
This could be designated on the administrative boundary relation when this
is verifiable; for example in Quebec the primary language is French, while
in most of the provinces of Canada it would be English.

The most complicated issue would be areas where local languages do not
relate to existing boundaries. For example, Indonesia has about 700
languages, and at least 300 are used as the primary, majority language for
communication within particular regions, while the official language
(Indonesian) is used for education, trade and government. In these areas
often places and geographical features will be named in the local language.
This also is the case in areas of Latin America and most of Africa, where
various indigenous languages are the primary language used in many small
regions.

In this case, there are two options.
1) Map the boundaries between areas. His would take work to verify; in each
place a mapper would need to check signs and interview local residents
about the local language. However, there are already sources of global
language boundary data that could be imported into the OSM database if we
want it, eg from Ethnologue. (I work with SIL and I could ask for this, if
we want). It would be questionable where to draw the line in sparsely
inhabited areas, but this problem already happens when selecting the name=*
tag for places in wildernesses; generally the language of the closest
settlement is used.

2) Tag each inhabited place with a language.
This might be considered more verifiable, because languages are a feature
of human geography, which is best represented by place data in OSM. It is
certainly possible for a local mapper to verify the majority language
spoken in their own community and neighboring settlements, and this would
not require drawing a line through uninhabited areas. However, it would be
more work to add tags to each place=hamlet/village/town/neighborhood than
to tag a single boundary line.

[Also see: https://wiki.openstreetmap.org/wiki/Multilingual_names]

-Joseph Eisenberg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tagging/attachments/20180914/0f2125f4/attachment.html>