[Talk-in] Automating OSM translation into Indic languages

Sajjad Anwar me at sajjad.in
Mon Apr 6 05:14:00 UTC 2015


This is awesome and timely.

Transliteration definitely should stay out of OSM. Since much of the
tracing itself is manual effort, it's okay to ask for translating manually.
Automatic translation is not going to take us far, it's complicated. We
were thinking of a tool that could list all name tags in like a spreadsheet
based on a bounding box, and the user can fill in the language specific
tag, hit save.

Want to take a stab at this? I can help.


On Sun, Apr 5, 2015 at 1:51 AM, I Chengappa <imchengappa at gmail.com> wrote:

> Hi Aruna
>  There's a lot of interesting material in your post. But first, can you
> clarify, are you seeking to add the transliterations to the OSM database,
> or adding them when rendering?. It would be worth keeping in mind this
> 'guideline' - [
> http://wiki.openstreetmap.org/wiki/Names#Avoid_transliteration], which I
> take to mean as 'don't add a transliteration if a machine can transliterate
> on the fly for you'. (I've been adding transliterations myself in ISO15199,
> despite this).
>  Regards, indigomc (I. M. Chengappa)
> On 4 April 2015 at 19:24, Aruna S <safincrazy at gmail.com> wrote:
>> Hello!
>> Long email warning.
>> I've been thinking a little bit about automating the translation of maps
>> into multiple Indic languages ever since I saw the Kannada map at geoBLR in
>> March.
>> I started some work on it today, and I have lots of interesting things to
>> report. Right now I am mostly transliterating as opposed to translating but
>> if a dictionary of common words/tags can be compiled, upgrading the script
>> to translate instead of transliterating should be doable.
>> Here's the algorithm I followed:
>>    1. Get the nodes within a bounding box from OSM using the python
>>    wrapper for Overpass - overpy
>>    <http://python-overpy.readthedocs.org/en/latest/example.html> - This
>>    returns a collection of nodes and associated ID, tags, lat, lon and other
>>    attributes. This can also be repeated for ways by using the corresponding
>>    overpy query.
>>    2. Filter nodes that have tags
>>    3. From the result of the filter, identify nodes with Indic language
>>    tags - eg:["name:kn"]
>>    4. Transliterate the string value for tag["name:kn"] to another
>>    language - I used Tamil - and store it within tag["name:ta"] - I used the Indic
>>    transliterator <http://silpa.org.in/Transliteration> APIs from SILPA
>>    <http://transliteration.readthedocs.org/en/latest/> for this
>>    5. Create a new changeset and upload the result(node with
>>    tag["name:ta"]) to OSM using osmapi
>>    <http://osmapi.divshot.io/#OsmApi.OsmApi.NodeUpdate>
>> I did it only for one node:
>> https://www.openstreetmap.org/edit?node=1118255762#map=19/12.99451/77.55430
>> *Advantages*
>>    - *Indic to Indic transliterations - ✓*The Indic transliterator APIs
>>    seem to convert quite effortlessly from one Indic language to another.
>>    Right now, support is available for Hindi, Tamil, Punjabi, Gujarati,
>>    Malayalam, Oriya, Bengaliand Kannada. So, if a Kannada tag exists in OSM,
>>    the same text can be transliterated into multiple Indic languages using the
>>    naive algorithm I described above.
>> *Limitations*
>>    - *English to Indic transliterations - X*: Though the Indic
>>    Transliterator works for English To Indic transliterations as well, it is
>>    not very useful. This is because only English words that are in the CMU
>>    dictionary are capable of being transliterated - which means that we can't
>>    transliterate "Raajaajeenagar", even if we had a custom tag for
>>    transliteration on OSM. On emailing the developer
>>    <http://thottingal.in/blog/about/> of the transliterator about
>>    extending the capabilities of English transliteration, I was told that
>>    extending the dictionary by adding additional words is one option. I am not
>>    sure of how feasible this is, or how much more optimal it is as compared to
>>    translating to one Indic language and transliterating+translating to the
>>    rest.
>>    - *Translations of English Words - X* - Right now, I am only able to
>>    transliterate words, but if a list of common words(I am guessing all the
>>    OSM tags, and other common words) could be compiled, and translated into
>>    all the Indic languages, the translation process can be automated quite
>>    easily. This would require the algorithm to have 2 additional steps
>>    1. From an Indic tag(i.e., an already translated tag, we would have
>>       to identify portions that are in the translations list, and leave them out
>>       of the transliteration process.
>>       2. For the word(s) identified in step 1, we must find a
>>       translation in the translations list for the language we are translating
>>       into. This must then be suffixed or prefixed with the transliterated
>>       portion. I am guessing suffix will be the norm, while prefixes might
>>       occasionally be necessary.
>>    - *Tracking node version numbers - X *- Right now, I am unable to
>>    track the version attribute of a node tag using the overpy API. I entered
>>    the version number manually. Not sure if I am missing something. This is
>>    just a "need-to-figure-out" issue more than anything. This is very
>>    important for automatically updating a node to the server because if
>>    there's a mismatch between the version number being passed to the API and
>>    the version number on the server, the API won't work.
>>    - *Which Indic Language to begin transliterating in* - Issues might
>>    arise if a language like Tamil - where the letter for ka, kha, ga, gha etc
>>    is the same - is say used to transliterate to Hindi. But, if we use a
>>    language like Kannada or Hindi for the first time, this issue can probably
>>    be resolved easily.
>> The script is on Github
>> <https://github.com/anura28/Automate-Translations-OSM/blob/master/automateIndicTranslation.py>.
>> Feel free to fork it, use it, work on it, edit it and suggest changes,
>> different language, other possibilities, alternatives etc. Pull Requests
>> very welcome. :)
>> This is my first time writing code in Python, so advice on improving code
>> would be very welcome. Also, let me know if I'm missing something else,
>> obvious or subtle.
>> Thanks!
>> Warmly,
>> Aruna
>> _______________________________________________
>> Talk-in mailing list
>> Talk-in at openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/talk-in
> _______________________________________________
> Talk-in mailing list
> Talk-in at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/talk-in

Sajjad Anwar http://geohacker.in <http://sajjad.in/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-in/attachments/20150406/157e8e23/attachment.html>

More information about the Talk-in mailing list