[OSM-talk] Chinese name tagging

D Tucny d at tucny.com
Thu Aug 2 21:31:09 BST 2007


Hi Folks,

With AND 'major roads' data for China on it's way and a slowly growing
collection of Chinese data, I just wanted to throw some things about
regarding tagging in Chinese that could also carry into other
non-'latin' script areas...

Why Chinese tagging is complicated
Written Chinese carries the meaning, not the pronunciation. There are
two main modern forms of written Chinese, Simplified Chinese and
Traditional Chinese, the main difference between the two is that,
potentially obviously, a lot of characters in Simplified Chinese have
been simplified compared with their traditional variants... Simplified
Chinese is used in mainland China, Singapore and Malaysia, Traditional
Chinese is used in Hong Kong, Macau and Taiwan... So that covers
written Chinese, however, there are also multiple forms of spoken
Chinese, the main two are Mandarin and Cantonese, Mandarin is used in
mainland China, Taiwan, Singapore and Malaysia, Cantonese is used in
the Guangdong province, Hong Kong and Macau. There are many form of
romanisation used with these spoken forms of Chinese, the main one for
Mandarin is Pinyin, for Cantonese there seems to be no clear leader...
To complicate things a bit further, Mandarin and Cantonese are both
tonal languages, Mandarin has 4 tones (or 5 if you include the neutral
tone) and Cantonese has 9, these tones are optionally used in the
romanization schemes too...

So, now that we know it's complex...

What do we need to cover?
Well, we'd want our maps to be useful to local folks, so, we need
Chinese, probably best to use the form of Chinese that is used in the
area, e.g. Simplified in mainland China, Traditional in Hong Kong.
We probably want our maps to be useful to people who don't know any
Chinese, as English is used in lots of places as the international
alternative, then we probably want to have English where available,
especially where signs have English too, most road signs at least will
have English or Pinyin depending on the place, along with the Chinese
characters... We probably want to give people the chance of being able
to say names even if they don't understand the written characters, so
a romanised form would be useful, i.e. Pinyin where Mandarin is
spoken, and to really give people a chance of being understood, we
probably need to include tones where we can, tone numbers could be
useful for those who know what those numbers refer to and would be
easy to render, but, where tone 'accents' could probably be useful to
more people...
In summary, that gives us, Chinese, English, Pinyin (for Mandarin
speaking areas), Pinyin with tones (for Mandarin speaking areas)...

How do we tag it?

Using Shanghai as an example, it's name is tagged like so...

name=上海
name:en=Shanghai
name:zh=上海
name:zh_py=Shanghai
name:zh_pyt=Shànghǎi
or
name:zh_pyt=Shang4hai3 (if you've never seen numeric tone markers
before, this isn't probably going to help your pronunciation much)

Got all our bases covered, but, it looks pretty wasteful, lots of
duplication, so, lets take a real world street name example...

name=环城北路
name:en=Huancheng North Road
name:zh=环城北路
name:zh_py=Huancheng Bei Lu
name:zh_pyt=Huánchéng Běi Lù

Rendering
Currently, only name is rendered on our public maps...

osmarender/t at h will render Chinese text if the machine rendering the
tile has a suitable Chinese font installed, but that varies...

mapnik renders nice boxes where Chinese characters should live as the
font doesn't have the characters and mapnik currently doesn't fall
back to another font...

It could perhaps be good to render one/some of the other forms, I've
done some renders with osmarender with Chinese and Pinyin with tones
on roads and it looked pretty good... T at H/mapnik folks?

Of course, if you are doing custom rendering, you can control what
gets rendered and how it gets rendered, and if you have the
information that you want to render already there, you'll be much
happier than if you have to make it/try to automatically generate
it/leave it out...

So... there's my mind dump on the subject... any comments?

d


More information about the talk mailing list