[Talk-in] Automating OSM translation into Indic languages

Sun Apr 5 17:46:47 UTC 2015

>
> On 4 April 2015 at 19:24, Aruna S <safincrazy at gmail.com> wrote:
>
>> Hello!
>>
>> Long email warning.
>>
>> I've been thinking a little bit about automating the translation of maps
>> into multiple Indic languages ever since I saw the Kannada map at geoBLR in
>> March.
>>
>> I started some work on it today, and I have lots of interesting things to
>> report. Right now I am mostly transliterating as opposed to translating but
>> if a dictionary of common words/tags can be compiled, upgrading the script
>> to translate instead of transliterating should be doable.
>>
>>
Fascinating stuff Aruna, this is opens up some great ideas to explore.

A few days back wile talking to a Uber taxi driver
<https://twitter.com/geohacker/status/583605884581883904>, he said he would
prefer English names on the map because he found the Kannada names for
common names like block, street, road, hospital etc unfamiliar. He was
happy to see the Kannada script in use though, so there is a definite use
for transliterated rather than translated English names.

Translated/localized names seems to be of more interest from an official
viewpoint, for the government and in use in education, but probably not
practical for everyday use at this time. Eventually we would need a
mechanism to make both indic translated maps as well as transliterated ones.

> Here's the algorithm I followed:
>>
>>    1. Get the nodes within a bounding box from OSM using the python
>>    wrapper for Overpass - overpy
>>    <http://python-overpy.readthedocs.org/en/latest/example.html> - This
>>    returns a collection of nodes and associated ID, tags, lat, lon and other
>>    attributes. This can also be repeated for ways by using the corresponding
>>    overpy query.
>>    2. Filter nodes that have tags
>>    3. From the result of the filter, identify nodes with Indic language
>>    tags - eg:["name:kn"]
>>    4. Transliterate the string value for tag["name:kn"] to another
>>    language - I used Tamil - and store it within tag["name:ta"] - I used the Indic
>>    transliterator <http://silpa.org.in/Transliteration> APIs from SILPA
>>    <http://transliteration.readthedocs.org/en/latest/> for this
>>    5. Create a new changeset and upload the result(node with
>>    tag["name:ta"]) to OSM using osmapi
>>    <http://osmapi.divshot.io/#OsmApi.OsmApi.NodeUpdate>
>>
>> I did it only for one node:
>> https://www.openstreetmap.org/edit?node=1118255762#map=19/12.99451/77.55430
>>
>>

Sajjad and I have been discussing how to practically approach this wrt to
the OSM data model and tags. Like indigomc mentioned, since
transliterations can be automated, it probably does not make sense to store
transliterations in OSM itself, but as a separate dictionary which can be
looked up by the rendering engine to switch to practically any language.

It does not seem like anyone has thought much about how such a system could
work and India could be an ideal testing location since we have a practical
need for it.

One possibility is to generate a simple csv of names and transliterated
names, which can be attached to the OSM geometries in a custom rendering
flow using Mapbox Studio. If you can prepare such a csv, i'd be happy to
try this out.

-- 
 Arun Ganesh
(planemad) <http://en.wikipedia.org/wiki/User:Planemad>
 <http://j.mp/ArunGanesh>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-in/attachments/20150405/1c1f04ff/attachment.html>