[openstreetmap/openstreetmap-website] Add lang attribute to Nominatim results from CJK languages (PR #6079)
Nick Doiron
notifications at github.com
Mon Jun 2 03:12:14 UTC 2025
### Description
In Unicode, some CJK characters such as 化 have one codepoint but will appear differently in Simplified Chinese (<span lang="zh-Hans">化</span>), Traditional Chinese (<span lang="zh-Hant">化</span>), and Japanese (<span lang="ja">化</span>). On the frontend, we can display names correctly using an HTML attribute such as `lang="zh-Hant"` This issue is known as [Han unification](https://en.wikipedia.org/wiki/Han_unification) and it has appeared over the years [in many software projects](https://issues.chromium.org/issues/41315603)
This was addressed in iD https://github.com/openstreetmap/iD/pull/10716 and is a long-running discussion in openstreetmap-carto.
If we add `&addressdetails=1` to Nominatim queries, we can read the country_code and display the best label for mainland China, Hong Kong, Japan, or Taiwan.
### How has this been tested?
This can be tricky to test, as **many names do not change**, and the display_name will be in your browser's language if it's available
- Search results will have a lang tag, such as `lang="zh-HK"` or `lang="ja"`, regardless of language of display_name
- In Taiwan, a search result for <span lang="zh-Hant">彰化</span> should show a horizontal bar in <span lang="zh-Hant">化</span>
- In mainland China, a search result for <span lang="zh-Hans">玉门 expressway</span> should return a split frame <span lang="zh-Hans">门</span> in the second character, not the 门 with a +
### Notes
As an alternative to adding `&addressdetails=1` to queries, we could possibly parse display_name (varies with the browser language) or use geo bounding boxes?
This matching of languages is imperfect, but without a language tag we are always using your browser's default for any CJK character. It would be difficult to make exceptions (for example, Japanese restaurants in these countries) without a name regex, a language tag, or access to other tags
This does not affect Chinese names in other countries
I have heard that there are some variations for Cyrillic in [Bulgaria](https://en.wikipedia.org/wiki/Bulgarian_alphabet) and [Serbia](https://en.wikipedia.org/wiki/Serbian_Cyrillic_alphabet#Differences_from_other_Cyrillic_alphabets), particularly in italics? But I don't know how universal it is. [Additional info](https://commons.wikimedia.org/wiki/File:Special_Cyrillics_BGDPT.svg)
You can view, comment on, or merge this pull request online at:
https://github.com/openstreetmap/openstreetmap-website/pull/6079
-- Commit Summary --
* add lang attribute to results from CJK countries, plus Cyrillic
* remove Bulgaria/Serbia for now
* fix HK subregion
-- File Changes --
M app/controllers/concerns/nominatim_methods.rb (2)
M app/controllers/searches/nominatim_queries_controller.rb (7)
M app/helpers/geocoder_helper.rb (2)
-- Patch Links --
https://github.com/openstreetmap/openstreetmap-website/pull/6079.patch
https://github.com/openstreetmap/openstreetmap-website/pull/6079.diff
--
Reply to this email directly or view it on GitHub:
https://github.com/openstreetmap/openstreetmap-website/pull/6079
You are receiving this because you are subscribed to this thread.
Message ID: <openstreetmap/openstreetmap-website/pull/6079 at github.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/rails-dev/attachments/20250601/0045f31d/attachment.htm>
More information about the rails-dev
mailing list