[Talk-in] Automating OSM translation into Indic languages

Aruna S safincrazy at gmail.com
Mon Apr 6 07:12:20 UTC 2015


Planemad,
I'm hoping that solving the transliteration problem is easier than solving
the translation problem. Just auto-transliteration would be a nice initial
goal to have.

I'm looking for more libraries that can transliterate from English to at
least one Indic language. The developer of SILPA had a valid point about
English having a very weak pronounciation->typing link. So words like Queue
etc would be hard to transliterate using a generic transliterator. There's
a lot more reading-up that I need to do, before re-working the python
script. I'll try to get the csv extraction out soon.

Google's transliteration APIs have deprecated
<https://developers.google.com/transliterate/v1/getting_started> and Yahoo
is going to stop maintaining it's transliteration application
<http://transliteration.yahoo.com/> from April 9th. I am looking for other
Indic transliterator softwares that come with APIs.(I have written to the
developer of Pramukh-IME <http://service.vishalon.net/pramukhtypepad.aspx>,
but it's not free/open source and may not have APIs).
Suggestions/alternatives very welcome. :)

Indigomc,
Yes. :) I thought about that too.

For Indic->Indic transliterations, Tamil would generally be a bad choice
for the first instance of transliteration, since it lacks so many alphabets
that other Indic languages do.

What language do you think would be a good choice while transliterating for
the very first time? I have a feeling that Hindi/Kannada would be friendly
and harmless. If you look at the current transliteration on the node
<https://www.openstreetmap.org/edit?node=1118255762#map=19/12.99451/77.55430>,
the Kannada to Tamil transliteration changes the Kananda T(h)a to the
confused Tamil Ta/Da equivalent automatically.

Sajjad,

Transliteration definitely should stay out of OSM. Since much of the
> tracing itself is manual effort, it's okay to ask for translating manually.
> Automatic translation is not going to take us far, it's complicated. We
> were thinking of a tool that could list all name tags in like a spreadsheet
> based on a bounding box, and the user can fill in the language specific
> tag, hit save.
>
> Want to take a stab at this? I can help.
>
> That would be so wonderful! Would this mean that we

   1. Extract all the name tags within a bounding box to a
   spreadsheet/document
   2. Host the document somewhere for people to edit.
   3. Update the map with the new tags

If we have an additional field to crowd source an English-><The Indic
Language we decide to pick for the first instance of transliteration>
transliteration for proper nouns in the name tag, it could be useful
if/when we want to transliterate into other Indic languages later.
Separating proper nouns is very important I think, since the SILPA
transliterator seems to transliterate English dictionary words quite well.
I will try to converse a little more with SILPA soon, and post updates.
Warmly,
Aruna
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-in/attachments/20150406/3e7a95fc/attachment.html>


More information about the Talk-in mailing list