<div dir="ltr"><div class="gmail_default" style="font-family:trebuchet ms,sans-serif">Hi, <br></div><div class="gmail_default" style="font-family:trebuchet ms,sans-serif">I believe a little more information needed to be added here to point the discussion to the right direction. The language usage a is not Latin script and the Unicode block is completely ok. So there is no issue in writing that language anywhere in internet and no additional special font is needed as well to render properly. At it is true that Unicode contains all the letter of all the languages, so if the font has the language specific Unicode block it should be displayed properly. <br></div><div class="gmail_default" style="font-family:trebuchet ms,sans-serif"><br></div><div class="gmail_default" style="font-family:trebuchet ms,sans-serif">So far from my experience i can say, that country map is not complete in OSM. Being an open source product there is a trust and dependability issue as well. More people are trying to use and showing interest here now a days, because Google is expensive. What are the outlets of using OSM from desktop and mobile? Those who are active and contributing for a longer than me can easily list top most popular/ used apps or sites to use OSM. How many of them supports complex non-latin unicode characters perfectly? I found only a very few but there could be more. <br></div><div class="gmail_default" style="font-family:trebuchet ms,sans-serif"><br></div><div class="gmail_default" style="font-family:trebuchet ms,sans-serif">So the map is incomplete with less data, there are also incorrect data and we are forcing to move it in a place where it will become completely useless. Because the softwares can not show the texts properly. Will it be helpful for that language or for that country? <br></div><div class="gmail_default" style="font-family:trebuchet ms,sans-serif"><br></div><div class="gmail_default" style="font-family:trebuchet ms,sans-serif">I am not against using the native language, rather i am contributing to Wikipedia and a number of open source communities on the same language version for more than 12 years. I am also involved in a number of language specific national expert committees. But here i am giving my opinion not to use the native language now atleast for the time being. <br></div><div class="gmail_default" style="font-family:trebuchet ms,sans-serif"><br></div><div class="gmail_default" style="font-family:trebuchet ms,sans-serif">It is true that all the people of the country are comfortable and prefer native language. Can you please provide me data what percentage of them are using OSM website and how many of them have a navigation app which is based on OSM? what are the use cases we are targeting to cover? <br></div><div class="gmail_default" style="font-family:trebuchet ms,sans-serif"><br></div><div class="gmail_default" style="font-family:trebuchet ms,sans-serif">If we write `name` in English and add the native name in `name:xx` and also add English in `name:en` for now and will it be impossible to move all the `name:xx` to `name` when scenario improves? I believe it could be done via automated scripts.</div><div class="gmail_default" style="font-family:trebuchet ms,sans-serif"><br></div><div class="gmail_default" style="font-family:trebuchet ms,sans-serif">Regards</div><div class="gmail_default" style="font-family:trebuchet ms,sans-serif">Nasir Khan<br></div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><span style="font-family:trebuchet ms,sans-serif"><br>--<br><b>Nasir Khan Saikat</b><br><a href="http://www.nasirkhn.com" target="_blank">www.nasirkhn.com</a></span><div><div><span style="font-family:trebuchet ms,sans-serif"><br></span></div></div></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 29 Nov 2019 at 00:18, Philippe Verdy <<a href="mailto:verdy_p@wanadoo.fr">verdy_p@wanadoo.fr</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">XML never started from scratch based on old versions of SGML or any updated version of SGML.<div>When it was created, Unicode was already there and its support in XML was mandatary from the start, including the support for UTF-8 by default. And It was based on the earlier work on XHTML which already included Unicode support by default as well, from the current development of HTML4 which was also updated to enforce the behavior for Unicode (notably it was made clear that to be conforming, the numeric character references could only refer to the UCS codepoints, independently of the charset used for the document, and that all charsets had to have a mapping to the UCS.</div><div><br></div><div>Now the issue is possibly elsewhere: when languages uses a script or orthography not based on Unicode because it is still not well supported or has problems.</div><div>- there were problems for Korean in Unicode 1.0 before the merge with the ISO 10646, but Unicode 1.0 is dead since long and no software today are making any reference to Unicode 1.0;</div><div>- there has been problems with the Unicode encoding for Burmese, and Mongolian, they are mostly solved, except Mongolian with works still pending for the behavior of some clusters and the best way to encode the vowels, this will soon change but yes in that case there are problems; but the change will not be from adopting or not Unicode, but in the best sequences of Unicode characters to use to represent these clusters: this is an orthographic change, not a change of encodings, but yes in that ase it measn changing Unicode fonts for other updated Unicode fonts; no hack based on legacy charsets are involded.</div><div><br></div><div>Now there remains languages/scripts not encoded at all (not in Unicode and not even in any other charset): making a reference to a legacy ISO chartset is inapplicable as there's no such legacy charset. All that an be done for now in these languages is to use some transliteration (but not necessarily Latin): Uyghur for example is generally written in that case using Chinese sinograms (with some specific forms in rare cases), or Arabic (with some additional diacritics and forms, but if thee forms are not handled in fonts, at least there's a basic orthography that is readable, the same way that we can substitute some characters in Latin or remove some diacritics for African languages, or simply not encoding some ligatures by writing digrams instead: this is what happens already when these langauges are used in some international documents and forms like passports: there's a degraded orthography, but this is still readable and sufficiently distinctive for practical uses and isolated text fragemtsn are not the onily source of disambiguation as there are other contextual information, including photo and biometric data or unique identifiers, and a scanned handwritten signature, plus personal data, including address for identification purpose).</div><div><br></div><div>Anyway, even if there's a prefered orthography, slight deviation of orthograhy is very common and frequently used in public displays or advertizing, and no one is confused. And the "prefered" orthography is just a matter of choice and is unstable across time, or even space when there are competing authorities providing their own local terminology for some local official uses, and not mandatory everywhere (and most languages also have lot of dialects that may use different orthography to render their own local phonology and accents: not everyone agree with these prefered form, even in the same location where dialects are also competing. and let's remember that all modern language continue to evolve and borrow a lot from other languages and new terms are creatively added. Finally there are orthographic reforms, but they take a considerable time to be adopted or never reah any acceptation and legacy orthographies remain visible in lot of places and publications (plus, people are much more mobile today and there are widespread communities located around the world that adapt constantly to their new context and on which the official reforms have no impact).</div><div><br></div><div>So in conclusion, there's no other choice than Unicode today. Unicode is mandatory in XML, and in OSM. Don't spak about legacy charsets. But we are jsut concerned by support in fonts: ALL characters encoded up to Unicode 9.0 have suitable fonts immediately usable, and these fonts are all free for use, and based on TrueType/OpenType. All OSM rendering softwares should be able to use TrueType/Opentype fonts. The only remaining problem is the existence of mobile phones that don't have a lot of embedded fonts and support a more limited set. But none of them are using or need any legacy charsets.</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Le jeu. 28 nov. 2019 à 15:11, John Whelan <<a href="mailto:jwhelan0112@gmail.com" target="_blank">jwhelan0112@gmail.com</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>The way I would approach this professionally
would be to define the requirements first.<br>
<br>
In this case we have a requirement to display the name in the language
of choice.<br>
<br>
We also have a requirement to be compatible with existing software.<br>
<br>
Pragmatically I would recommend changing the name field to use only an 8
bit Latin alphabet character set recognizing that not all systems can
handle more complex character sets. Which precise character set should
be chosen would a be subject for discussion but either <span style="color:rgb(34,34,34);font-family:sans-serif;font-size:14px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">ISO-8859-1 or </span><span style="color:rgb(34,34,34);font-family:sans-serif;font-size:14px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Windows-1252 would be contenders</span>. My personal preference
would be the ISO standard.<br>
<br>
Unicode is nice but we managed with 6 bit character sets for many years
when I started with computers. Even accented characters were a major
problem. Also remember that .OSM data is in XML format and XML came out
of SGML which was first used to transmit documents over modems so only 7
bits where available for encoding characters. The extended characters
use a special escape code sequence to hold the unicode characters.<br>
<br>
Realistically software never wears out but source code gets lost.
Compilers and operating systems get updated. It may not be possible to
modify existing software to handle unicode characters. I have a
perfectly good scanner sitting in the corner that no long can be used
with Win 10 because of a new and improved driver. With the
OpenStreetMap environment there isn't even a way to get a complete list
of software that uses the OpenStreetMap data so it can be tested. <br>
<br>
The local language can be added in a name: then software that can
handle the local names can pick it up. Osmand etc. can be configured to
use the local name transparently so the local population can use it in the
language of their choice.<br>
<br>
This approach would appear to meet the requirements. The argument that
we should change all the existing software to meet a requirement that
was not clearly defined when the software was written doesn't make sense
to me.<br>
<br>
Cheerio John<br>
<br>
<span>Frederik Ramm wrote on 2019-11-28 3:25 AM:</span><br>
<blockquote type="cite">
<pre>John,
On 28.11.19 01:40, John Whelan wrote:
</pre>
<blockquote type="cite"><pre>Is there any reason why name:en could not be used?
</pre></blockquote>
<pre>The country's official language requires a "non-standard" font to be
available which does not seem to be a given on all platforms. Like if
you set up a standard tile server and don't install extra fonts you will
see little squares instead of place names all over China.
Apparently not all applications are as good in name:xx handling as
OsmAnd. A recurring point in the discussion is that the proponents of
using the official language say "we shouldn't fall back to English name
tags just because some apps/web sites are broken, we should file bug
reports with them instead", and the proponents of using English say
"let's be pragmatic, there's no way all these apps/sites will be fixed
within a short time, so we should use English".
Bye
Frederik
</pre>
</blockquote>
<br>
<div>-- <br>
<div>Sent from <a href="https://www.postbox-inc.com" target="_blank"><span style="color:rgb(0,157,247)">Postbox</span></a></div></div>
</div>
_______________________________________________<br>
HOT mailing list<br>
<a href="mailto:HOT@openstreetmap.org" target="_blank">HOT@openstreetmap.org</a><br>
<a href="https://lists.openstreetmap.org/listinfo/hot" rel="noreferrer" target="_blank">https://lists.openstreetmap.org/listinfo/hot</a><br>
</blockquote></div>
_______________________________________________<br>
HOT mailing list<br>
<a href="mailto:HOT@openstreetmap.org" target="_blank">HOT@openstreetmap.org</a><br>
<a href="https://lists.openstreetmap.org/listinfo/hot" rel="noreferrer" target="_blank">https://lists.openstreetmap.org/listinfo/hot</a><br>
</blockquote></div>