[OSM-talk] Chinese name tagging

80n 80n80n at gmail.com
Thu Aug 2 23:11:37 BST 2007


On 8/2/07, D Tucny <d at tucny.com> wrote:
>
> Hi Folks,
>
> With AND 'major roads' data for China on it's way and a slowly growing
> collection of Chinese data, I just wanted to throw some things about
> regarding tagging in Chinese that could also carry into other
> non-'latin' script areas...
>
> Why Chinese tagging is complicated
> Written Chinese carries the meaning, not the pronunciation. There are
> two main modern forms of written Chinese, Simplified Chinese and
> Traditional Chinese, the main difference between the two is that,
> potentially obviously, a lot of characters in Simplified Chinese have
> been simplified compared with their traditional variants... Simplified
> Chinese is used in mainland China, Singapore and Malaysia, Traditional
> Chinese is used in Hong Kong, Macau and Taiwan... So that covers
> written Chinese, however, there are also multiple forms of spoken
> Chinese, the main two are Mandarin and Cantonese, Mandarin is used in
> mainland China, Taiwan, Singapore and Malaysia, Cantonese is used in
> the Guangdong province, Hong Kong and Macau. There are many form of
> romanisation used with these spoken forms of Chinese, the main one for
> Mandarin is Pinyin, for Cantonese there seems to be no clear leader...
> To complicate things a bit further, Mandarin and Cantonese are both
> tonal languages, Mandarin has 4 tones (or 5 if you include the neutral
> tone) and Cantonese has 9, these tones are optionally used in the
> romanization schemes too...
>
> So, now that we know it's complex...
>
> What do we need to cover?
> Well, we'd want our maps to be useful to local folks, so, we need
> Chinese, probably best to use the form of Chinese that is used in the
> area, e.g. Simplified in mainland China, Traditional in Hong Kong.
> We probably want our maps to be useful to people who don't know any
> Chinese, as English is used in lots of places as the international
> alternative, then we probably want to have English where available,
> especially where signs have English too, most road signs at least will
> have English or Pinyin depending on the place, along with the Chinese
> characters... We probably want to give people the chance of being able
> to say names even if they don't understand the written characters, so
> a romanised form would be useful, i.e. Pinyin where Mandarin is
> spoken, and to really give people a chance of being understood, we
> probably need to include tones where we can, tone numbers could be
> useful for those who know what those numbers refer to and would be
> easy to render, but, where tone 'accents' could probably be useful to
> more people...
> In summary, that gives us, Chinese, English, Pinyin (for Mandarin
> speaking areas), Pinyin with tones (for Mandarin speaking areas)...
>
> How do we tag it?
>
> Using Shanghai as an example, it's name is tagged like so...
>
> name=上海
> name:en=Shanghai
> name:zh=上海
> name:zh_py=Shanghai
> name:zh_pyt=Shànghǎi
> or
> name:zh_pyt=Shang4hai3 (if you've never seen numeric tone markers
> before, this isn't probably going to help your pronunciation much)
>
> Got all our bases covered, but, it looks pretty wasteful, lots of
> duplication, so, lets take a real world street name example...
>
> name=环城北路
> name:en=Huancheng North Road
> name:zh=环城北路
> name:zh_py=Huancheng Bei Lu
> name:zh_pyt=Huánchéng Běi Lù
>
> Rendering
> Currently, only name is rendered on our public maps...
>
> osmarender/t at h will render Chinese text if the machine rendering the
> tile has a suitable Chinese font installed, but that varies...
>
> mapnik renders nice boxes where Chinese characters should live as the
> font doesn't have the characters and mapnik currently doesn't fall
> back to another font...
>
> It could perhaps be good to render one/some of the other forms, I've
> done some renders with osmarender with Chinese and Pinyin with tones
> on roads and it looked pretty good... T at H/mapnik folks?
>
> Of course, if you are doing custom rendering, you can control what
> gets rendered and how it gets rendered, and if you have the
> information that you want to render already there, you'll be much
> happier than if you have to make it/try to automatically generate
> it/leave it out...
>
> So... there's my mind dump on the subject... any comments?


It would be feasible given enough disk space etc, for t at h to render a
captions as a separate transparent tile set.  This can then be layered on
top of the map using an Open Layers overlay layer.

It then becomes possible to generate caption layers for multiple languages
and either manually or automatically switch them on/off depending on who the
user is and what part of the world they are looking at.  So the English
caption layer would be generated using the name:en tag, but fall back to the
name tag if there is no name:en tag.  A Chinese caption layer would use the
name:zh tag where there is one, falling back to name where there isn't one.

80n




d
> _______________________________________________
> talk mailing list
> talk at openstreetmap.org
> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/talk
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20070802/d74ca101/attachment.html>


More information about the talk mailing list