[OSM-talk] Chinese name tagging

Jon Burgess jburgess777 at googlemail.com
Sun Aug 5 16:26:18 BST 2007


On Fri, 2007-08-03 at 04:31 +0800, D Tucny wrote:
> Hi Folks,
> 
> With AND 'major roads' data for China on it's way and a slowly growing
> collection of Chinese data, I just wanted to throw some things about
> regarding tagging in Chinese that could also carry into other
> non-'latin' script areas...
> 
> Why Chinese tagging is complicated
> Written Chinese carries the meaning, not the pronunciation. There are
> two main modern forms of written Chinese, Simplified Chinese and
> Traditional Chinese, the main difference between the two is that,
> potentially obviously, a lot of characters in Simplified Chinese have
> been simplified compared with their traditional variants... Simplified
> Chinese is used in mainland China, Singapore and Malaysia, Traditional
> Chinese is used in Hong Kong, Macau and Taiwan... So that covers
> written Chinese, however, there are also multiple forms of spoken
> Chinese, the main two are Mandarin and Cantonese, Mandarin is used in
> mainland China, Taiwan, Singapore and Malaysia, Cantonese is used in
> the Guangdong province, Hong Kong and Macau. There are many form of
> romanisation used with these spoken forms of Chinese, the main one for
> Mandarin is Pinyin, for Cantonese there seems to be no clear leader...
> To complicate things a bit further, Mandarin and Cantonese are both
> tonal languages, Mandarin has 4 tones (or 5 if you include the neutral
> tone) and Cantonese has 9, these tones are optionally used in the
> romanization schemes too...
> 
> So, now that we know it's complex...
> 
> What do we need to cover?
> Well, we'd want our maps to be useful to local folks, so, we need
> Chinese, probably best to use the form of Chinese that is used in the
> area, e.g. Simplified in mainland China, Traditional in Hong Kong.
> We probably want our maps to be useful to people who don't know any
> Chinese, as English is used in lots of places as the international
> alternative, then we probably want to have English where available,
> especially where signs have English too, most road signs at least will
> have English or Pinyin depending on the place, along with the Chinese
> characters... We probably want to give people the chance of being able
> to say names even if they don't understand the written characters, so
> a romanised form would be useful, i.e. Pinyin where Mandarin is
> spoken, and to really give people a chance of being understood, we
> probably need to include tones where we can, tone numbers could be
> useful for those who know what those numbers refer to and would be
> easy to render, but, where tone 'accents' could probably be useful to
> more people...
> In summary, that gives us, Chinese, English, Pinyin (for Mandarin
> speaking areas), Pinyin with tones (for Mandarin speaking areas)...
> 
> How do we tag it?
> 
> Using Shanghai as an example, it's name is tagged like so...
> 
> name=上海
> name:en=Shanghai
> name:zh=上海
> name:zh_py=Shanghai
> name:zh_pyt=Shànghǎi
> or
> name:zh_pyt=Shang4hai3 (if you've never seen numeric tone markers
> before, this isn't probably going to help your pronunciation much)
> 
> Got all our bases covered, but, it looks pretty wasteful, lots of
> duplication, so, lets take a real world street name example...
> 
> name=环城北路
> name:en=Huancheng North Road
> name:zh=环城北路
> name:zh_py=Huancheng Bei Lu
> name:zh_pyt=Huánchéng Běi Lù
> 
> Rendering
> Currently, only name is rendered on our public maps...
> 
> osmarender/t at h will render Chinese text if the machine rendering the
> tile has a suitable Chinese font installed, but that varies...
> 
> mapnik renders nice boxes where Chinese characters should live as the
> font doesn't have the characters and mapnik currently doesn't fall
> back to another font...
> 
> It could perhaps be good to render one/some of the other forms, I've
> done some renders with osmarender with Chinese and Pinyin with tones
> on roads and it looked pretty good... T at H/mapnik folks?
> 
> Of course, if you are doing custom rendering, you can control what
> gets rendered and how it gets rendered, and if you have the
> information that you want to render already there, you'll be much
> happier than if you have to make it/try to automatically generate
> it/leave it out...
> 
> So... there's my mind dump on the subject... any comments?

I experimented a little with rendering multiple name tags in Mapnik a
little while ago and came to the conclusion that we might be able to
render a single alternate name tag, but any more than 2 names per
location and the map gets too cluttered. The code to use multiple name
tags is in the latest osm2pgsql code but commented out (look for
compress_tag_name() ). 

As you mention, trying to figure out an appropriate alternate name is
not something the rendering code can really guess. If we wanted to go
down this route then I'd suggest an extra tag like:
mapnik:altname=name:zh to tell indicate the preferred alternate name to
show for a given object.

Multiple transparent overlay layers are probably the best long term
answer, but even then there are rendering placement issues due to the
text labels having different sizes or orientations on the multiple
layers.

	Jon






More information about the talk mailing list