[OSM-dev] Need advise for localization of Mapnik/Osmarender/Data search

Arne Goetje arne at linux.org.tw
Thu Jul 10 11:55:32 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi list,

I need some advice what would be the best way to localize our own OSM
server. Goal: to have one layer for each language for Mapnik and
Osmarender and an improved search engine for the planet.osm data.

The facts:
We have already
 * a dedicated server to host a planet.osm excerpt of Taiwan with Mapnik
set up.
 * localized name tags in the planet.osm data. Almost all data for
Taiwan uses English in the 'name' and 'ref' tags and Chinese in
'name:zh' and 'ref:zh' as well as some other local languages with the
appropriate language tags.

Now for Mapnik and Osmarender: we want to add a layer for each language
code on top of the default rendering layer (which uses English and the
normal 'name' and 'ref' tags), which uses the localized 'name:$lang' and
'ref:$lang' tags and the 'name' and 'ref' tags as fallback.

How do we do that?

Then the search engine: currently it only works well for addresses which
use spaces, but not for Han and Hangul scripts. The reason is, that in
Chinese, Japanese and Korean, addresses don't have spaces. Instead the
levels (Street, Number, Village, City, County, etc.) are distinguished
by a specific character. For example (Chinese as used in Taiwan): the
address Zhongzheng Rd. in Taipei would be written 台北市中正路 in one
string without spaces, where 台北市 stands for "Taipei City" (市 being
the character for City) and 中正路 stands for Zhongzheng Rd. (路 being
the character for Road).
In the planet.osm data we have 'is_in' and 'is_in:zh' tags, where the
Chinese version uses the same way to write the address:
The road Zhongzheng Rd. in Taipei has the 'is_in:zh' value 台灣台北市
(means Taiwan, Taipei City).
Another more complex (but not the most complex) example:
A search for 桃園縣八德市介壽路二段325巷1弄1衖 should find the alley '介
壽路二段325巷1弄1衖 (Alley 1-1, Ln. 325, Jieshou Rd. Sec. 2), which has
the 'is_in:zh' tag value 台灣桃園縣八德市 (Bade City, Taoyuan County,
Taiwan).

So, we need to enhance the search engine code to
 a) not rely on spaces as delimiters
 b) for Han and Hangul scripts know the correct and possible alternate
address schemes
 c) every possible English transliteration (for example 'Jieshou Rd.
Sec. 2' could also be written 'Sec. 2 Jieshou Rd.' and in multiple other
ways (Sec. 2, Jie-Shou Rd., etc.)
 d) spelling variants in English transliteration (for example the road
name Zhongzheng Rd. (中正路) can also be written ZhongZheng Rd.,
Zhong-Zheng Rd., Jhongjheng Rd., Jungjeng Rd. Chung-cheng Rd, and many
more). Many municipalities in Taiwan use alternate spelling systems, as
there exists no standard but many different ways to transliterate
Chinese characters into English. And people's name cards can also
contain spelling systems which you won't find on street signs anymore
(in my old company, my name card had the street name written as Tzu-You
Rd., although the City administration changed the spellings on the road
signs to be Ziyou Rd.).

It would be great if we could feed those improvements back into the main
OSM project, so that the search on the main OSM website also delivers
the correct results. So the question is: where is the code we need to
enhance and how to coordinate it? I could provide a list of aliases for
road names and Chinese characters to classify address patterns (County,
City, Village, etc.) and explain the possible address patterns.

Cheers
Arne
- --
Arne Götje (高盛華) <arne at linux.org.tw>
PGP/GnuPG key: 1024D/685D1E8C
Fingerprint: 2056 F6B7 DEA8 B478 311F  1C34 6E9F D06E 685D 1E8C
Key available at wwwkeys.pgp.net.   Encrypted e-mail preferred.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIdeqkbp/QbmhdHowRAqDBAKDpktXk1L9axzdpUWF5BEZMatcfswCgtgE6
t0WnMiJjZH8N74IHdlt6w/g=
=j6+S
-----END PGP SIGNATURE-----




More information about the dev mailing list