[OSM-dev] Need advise for localization ofMapnik/Osmarender/Data search

Arne Goetje arne at linux.org.tw
Thu Jul 10 16:52:39 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

wolschon wrote:
> If you want an englich map, you should use
> "name:en_UK", fall back to "name:en" and then fall back to "name".
> "name" will give you the most common name, wich need not be english at
> all.

well, we don't have any en_UK tags in Taiwan, so it would be :en with
fallback to 'name'. And if, then it would be en_US. However, the
transliterations actually cannot be called "English", because it has
nothing to do with English. It's just an aid for foreigners so that they
know how to pronounce the street names. And as I stated already, there
exist multiple different systems and revisions thereof to accomplish
that goal more or less. (Sometimes it just leads to confusion,
especially since every municipality is free to use their system of
choice (and I'm not talking about the spelling mistakes now... :( ).)

Anyways, can you give me an example (code) how to accomplish the
language tag and fallback for Mapnik and Osmarender?

> Thanks for letting us know about the adderess-details.
> They are most interesting.

I didn't even go fully into the details... I just scratched the top...
;) Addresses in Japan and Korea look similar by they way but use
different syntax and characters.

>>> So, we need to enhance the search engine code to
>>>  a) not rely on spaces as delimiters
>>>  b) for Han and Hangul scripts know the correct and possible alternate
>>> address schemes
> 
> A simple way would be to add 市 and 路 to the delimiters.
> A better way to try different MessageFormats and stop on the first
> that can parse a complete adress and convert it into country(opt), state(opt), city, street, number(opt), additional(opt).

MessageFormats sounds good.
For example: the city field can have 市 City, 鄉 Township (large) or 鎮
Township (small), depending on the number of inhabitants, optionally
followed by 區 District, 村 Village (large) or 里 Village (small).

Where would I put such a code?

>>>  c) every possible English transliteration (for example 'Jieshou Rd.
>>> Sec. 2' could also be written 'Sec. 2 Jieshou Rd.' and in multiple other
>>> ways (Sec. 2, Jie-Shou Rd., etc.)
> 
> Here it would be much better to convert street-names into a canonical form
> and search on that one.
> e.g. ("Rd","Rd.", "rd", "rd.")=>"Road", "-"=>"", ...
> I am doing this for german Adresses already and it seems to work fine.
> ("Straße")=>"Strasse"
> it also corrects spellings on the fly, e.g.  ("Alle", "Alee")=>"Allee"

Yes, I had something like this in mind. However, they would map to the
Chinese names, because those are the only consistent values...
Transliterated names on street signs differ greatly throughout the whole
country, and we usually follow the policy to put that into the database
what is printed on the street signs (if there is a transliteration at
all, and if not, then use the same spelling system like the other
surrounding street names use.)

>>>  d) spelling variants in English transliteration (for example the road
>>> name Zhongzheng Rd. (中正路) can also be written ZhongZheng Rd.,
>>> Zhong-Zheng Rd., Jhongjheng Rd., Jungjeng Rd. Chung-cheng Rd, and many
>>> more). Many municipalities in Taiwan use alternate spelling systems, as
>>> there exists no standard but many different ways to transliterate
>>> Chinese characters into English. And people's name cards can also
>>> contain spelling systems which you won't find on street signs anymore
>>> (in my old company, my name card had the street name written as Tzu-You
>>> Rd., although the City administration changed the spellings on the road
>>> signs to be Ziyou Rd.).
> 
> Can we push these into the database as alternative names or does the
> restrictions that most tools cannot work with multiple values for one
> key present a problem here?

I would not want to store all possible alternative spellings for each
and every road segment... that's too much redundant data. Instead have a
library where all possible spellings for each road name are stored and
let the search engine use that one. A lot of road names are recycled in
Taiwan... you will find the same road names in nearly every
municipality. Only a hand full differ.
Example:
The general road name pattern is "name (suffix) Road|Street (Section
1-8)" (where suffix and section are optional)
The most common names include for example:
"Zhongzheng", "Zhongshan", "Fuxing", "Juguang", "Guangfu", "Minsheng",
"Minquan", "Minzu", "Xinsheng", "Chenggong" (all of which can have
multiple spellings, depending on the municipality (and sometimes even
differ within the same municipality for the same road :( ).)
The suffix can be "North", "South", "West", "East" or an ordinal (1st,
2nd, 3rd, 4th, ..., 20th).
The Section is added for very long roads and counting from 1 to 8,
Section 1 being in the center of the municipality and higher numbers
further outside.
The above mentioned names for example can be found in basically every
municipality, with or without suffixes and sections.

Therefor, we can just make the mapping once and reuse it.

> 
>> Full text search makes transliteration a piece of cake and makes it case
>> insensitive. You would need to write your own "sounds-like" script on
>> the database for cases like: Zhong-Zheng Rd., Jhongjheng Rd., Jungjeng
>> Rd. Chung-cheng Rd, where I would advice you to maintain a seperate data
>> table.
> 
> Sound good. Do you have some rules that we can put into the OSM-wiki for
> other people doing adress-seach on OSM to use? (I would be interested for
> the AdressDBPlaceFinder in Traveling-Salesman.)
> 

Cheers
Arne

- --
Arne Götje (高盛華) <arne at linux.org.tw>
PGP/GnuPG key: 1024D/685D1E8C
Fingerprint: 2056 F6B7 DEA8 B478 311F  1C34 6E9F D06E 685D 1E8C
Key available at wwwkeys.pgp.net.   Encrypted e-mail preferred.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIdjBHbp/QbmhdHowRAuZYAKDpAbA/D5acALEuiI6epr1kTdrEAgCg5xUp
nw/WVznGSwlgU858aTOrLRg=
=gYhp
-----END PGP SIGNATURE-----




More information about the dev mailing list