[Talk-in] Automating OSM translation into Indic languages
Srikanth Lakshmanan
srik.lak at gmail.com
Tue Nov 1 08:07:14 UTC 2016
I have put the scripts and documented some steps here[1]. Let me know if
you have any issues in using them for different language.
The scripts need some cleanup / refinement to make it generic and a full
Google Sheets UI would help users from modifying the scripts manually.
Indic maps - Translation priority:
I think, we need to decide nodes / ways (State, city names, National
highways, Railway stations, Rivers etc) at India / state level that need to
have translations in all official languages of India / state and then
target translating them as a priority list, so that we have usable Indic
maps at India level. The same can be repeated at state level for languages
spoken in state for lower levels of data.
Notes about data :
Google Translate might sometimes be dependent on Wikidata, which if has an
error will be transported to maps too. So some form of manual verification
is needed / advisable. Sometimes there could be spelling differences (where
Wikipedia opting names following their own naming practices [Imagine
locales here / infamous anti-grantha debate in Tamil] / Google
transliterating because that is widely used spelling online, offline could
be entirely different.). So when we add name:xx translation, an OSM user
needs to see which is best used in a map, something that will help all map
users in language identify the place while navigating / visualizing. Other
alternate names, old names, short forms in languages can also be added
appropriately. Google Translation, Wikidata, Transliteration can all be
only suggestions.
When Wikidata and Google translate output matches, there is a reasonable
assurance on the translation though. But its not very easy to flag either
output as "Wrong" sometimes when they dont match. I did not go through this
exercise anyways.
Wikidata / Google overlap stats (which got uploaded yesterday)
Sri Lanka - 70 / 2200~ (Locale could be a factor here, where WP articles
might be in ta-lk locale, Google might have transliterated the names from
Sinhalese names)
Malaysia - 15 / 1350 ( Lack of Wikidata is main reason here, no one created
enough Tamil WP articles for Malaysian locations)
Indian Cities - 36 / 137
Tamil Nadu - 175 / 2700~
Tool :
I am still thinking on different options and undecided if I should continue
using Google sheets / use obtained sheets data as a backend and build a
simple UI which will upload to OSM using user's OSM credentials directly
instead of using a script. Building an overpass query builder specific to
this activity might also be useful. Feedback / help welcome.
[1] https://github.com/srikanthlogic/osm-ta
On 1 November 2016 at 09:32, Arun Ganesh <arun.planemad at gmail.com> wrote:
> This is amazing Srikanth! Can you also share statistics of how many did
> not match, maybe it could be used to flag potentially incorrect
> translations on Wikidata.
>
> On Tue, Nov 1, 2016 at 8:42 AM, manoj k <manojkmohanme03107 at gmail.com>
> wrote:
>
>> Great. How can i test for malayalam ?
>>
>> Manoj.K/മനോജ്.കെ
>> www.manojkmohan.com
>>
>> On Tue, Nov 1, 2016 at 1:34 AM, Srikanth Lakshmanan <srik.lak at gmail.com>
>> wrote:
>>
>>> After loading the yet to be translated nodes into google sheets, I used
>>> the =GOOGLETRANSLATE() function to get google translations of place names
>>> and also queried wikidata to get wikidata translations. When both these
>>> matched, I extracted a list and uploaded through a script.[1] I also
>>> verified the changesets[2] that were created through script and did not
>>> find any issues.
>>>
>>> [1] https://gist.github.com/srikanthlogic/5148571367bb829b03
>>> c2170af59880c4
>>> [2] http://www.openstreetmap.org/user/SrikanthLogic/history
>>>
>>>
>>> On 6 October 2016 at 13:29, Srikanth Lakshmanan <srik.lak at gmail.com>
>>> wrote:
>>>
>>>>
>>>> On 7 April 2015 at 11:32, Sajjad Anwar <me at sajjad.in> wrote:
>>>>
>>>>> Aruna, we can use the Wikidata to get a first pass of the translation
>>>>> and then present the spreadsheet view for someone to eyeball and add
>>>>> missing?
>>>>>
>>>>
>>>> I got a spreadsheet view[0] to this using the OverpassToGoogleSheets
>>>> appscript[1] (which I plan to make a complete add-on) to give spreadsheet
>>>> view. Once the ability to generate spreadsheet or any language / area
>>>> combination is complete, use the sheet to build a HTML microtask UI where
>>>> user can translate (with a small map view to give context). Eventually, we
>>>> could build something envisioned at Map Translation Interface[3].
>>>>
>>>> [0] https://docs.google.com/spreadsheets/d/1imONog35BeDTEuwR
>>>> PgrRRDAF7pvbSHs1Ryv-18yfOaQ/edit#gid=201371839
>>>> [1] https://github.com/srikanthlogic/tangrams-indic/blob/gh-page
>>>> s/Translator-Tools/appscript/OverpassToGoogleSheets.gs
>>>> [2] http://wiki.openstreetmap.org/wiki/Map_Translation_Interface
>>>>
>>>> --
>>>> Regards
>>>> Srikanth.L
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>> Srikanth.L
>>>
>>> _______________________________________________
>>> Talk-in mailing list
>>> Talk-in at openstreetmap.org
>>> https://lists.openstreetmap.org/listinfo/talk-in
>>>
>>>
>>
>> _______________________________________________
>> Talk-in mailing list
>> Talk-in at openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/talk-in
>>
>>
>
> _______________________________________________
> Talk-in mailing list
> Talk-in at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/talk-in
>
>
--
Regards
Srikanth.L
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-in/attachments/20161101/3ccc4a97/attachment.html>
More information about the Talk-in
mailing list