[Talk-in] Automating OSM translation into Indic languages

Srikanth Lakshmanan srik.lak at gmail.com
Wed Apr 8 02:49:08 UTC 2015


I modified the script little and got a list of places with their translated
names. Gist and csv data files[1]. Change the bbox / query params to get
other names.

Observations on the data:-
1. Wikidata querying doesnt honour redirects. So Bengaluru, Mysuru, et al
dont get results from wikidata. We probably need to use wikipedia API, see
if its a redirect, use the redirected page and get the translated name. I
was too lazy for 1st pass.
2. Need for manual verification :-

A. Place names can be similar to other verbs in English and we might have
got result of the verb's translation through wiki data. Ex :- kama,en
| काम,hi or might contain extra disambiguation terms which might not be
required in map. Anuradhapura,ml,അനുരാധപുരം (നഗരം)
B. Place might be only known for a thing for which wikipedia article is
created and interwiki linked. This is actually wikipedia's problem, but we
need to carry it on since we use them. Ex:- Kalasa,en |
Kalasa,kn,Kalgundi Sri Marulasidheshwara swami temple.
C. OSM data itself contains names in non latin script in name tags. I didnt
see much for Indian towns / cities, but it is the case for many Bangladesh
/ Nepal towns. Is there a discussion about which language should be used in
name tag? Needless to say this script cannot get indic names as English
wiki will not have pages in latin script.

I agree with Sajjad that this is going to be a tedious manual task(~6000
strings to look up) and we need a good web interface. I am looking at
crowdcrafting / pybossa. But if there can be a custom webapp built which
can directly upload change to OSM, nothing like it.

[1] https://gist.github.com/srikanthlogic/21368ec570608ab15c0f

On Tue, Apr 7, 2015 at 11:32 AM, Sajjad Anwar <me at sajjad.in> wrote:

> This is great.
> Aruna, we can use the Wikidata to get a first pass of the translation and
> then present the spreadsheet view for someone to eyeball and add missing?
> Cheers,
> Sajjad.
> On Tue, Apr 7, 2015 at 10:59 AM, Aruna S <safincrazy at gmail.com> wrote:
>> On Mon, Apr 6, 2015 at 2:40 PM, Srikanth Lakshmanan <srik.lak at gmail.com>
>> wrote:
>>> Hello,
>>> Great work, I have been thinking this for sometime. I am of the opinion
>>> that place names(towns / villages etc) should be translated and not
>>> transliterated. Arun has a point about locality address as people might be
>>> so used to English, that they find translations in their own language
>>> unusable.
>>> For place names, would it be a good idea to run a script which can look
>>> up wikidata, extract names in multiple language and update OSM? Below is a
>>> sample query for 'Bangalore' in multiple languages.
>>> [1]
>>> https://www.wikidata.org/w/api.php?action=wbgetentities&sites=enwiki&titles=Bangalore&languages=hi|ka|ml|ta|or|mr|gu|te&props=labels&format=xml
>> This seems like a wonderful idea. I'll use this while working on the
>> translation. Thanks. :)
>> _______________________________________________
>> Talk-in mailing list
>> Talk-in at openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/talk-in
> --
> Sajjad Anwar http://geohacker.in <http://sajjad.in/>
> _______________________________________________
> Talk-in mailing list
> Talk-in at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/talk-in

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-in/attachments/20150408/3e97282a/attachment.html>

More information about the Talk-in mailing list