[OSM-talk] Revival: Multilingual Country-List

Peter Wendorff wendorff at uni-paderborn.de
Fri Feb 22 08:42:21 UTC 2013


Am 21.02.2013 13:01, schrieb Hans Schmidt:
> Am 21.02.2013 12:36, schrieb Peter Wendorff:
>> Well... if there's no localized name tag, then you may omit the 
>> name:xx tag for that language, as there's no alternative.
>> On the other hand name:de might be useful even then, as it's possible 
>> to translate programmatically if the software knows about the 
>> language. The German suffixes -straße, -weg, -platz could be 
>> automatically transcoded to street, way and square, the afaik swedish 
>> -gatan is street again, väg is way and so on.
>> But if you try to translate something to another language this way 
>> where you don't know the source language, it's much more difficult.
>>
> Why would you want to translate the street names? Do you want to 
> translate Paris' “Avenue des Champs-Élysées” to “Allee der 
> Champs-Élysées”? Nobody would know what it is anymore.
> Also, nobody wants to translate a “Lindenallee” in some minor german 
> town to “Linden avenue”.  Also, automatic translation would be error 
> prone.
For complete names you may be right, but for Natural Language Generation 
used in tools based on osm data parts of names might be useful to 
translate. For the Lindenallee this might translate to "Go down the 
alley..." where alley might not be a given classification by tags, but 
due to the name only.
>> So a recommendation might be to
>> - always tag name
>> - if you translate name into different languages, always add 
>> name:originalLanguageCode with the same content
>> - if you want, add that even if you don't translate it to different 
>> languages.
>>
>> Yes, that's redundant - but it's easy to cut out for software (cut 
>> out every language attribute that equals the plain name), if wanted; 
>> and it's less error prone than a tag like "language=de" or like the 
>> lists of default language areas you propose above.
>> Sure: These list are helpful for all cases where only name is given, 
>> and that's a necessity for great software dealing with that, but 
>> that's the way defaults in OSM work: there should be a few defaults 
>> for mappers, where they should decide to not add a tag, but more 
>> defaults for data consumers, who could/should be able to have a best 
>> guess where data is missing. 
> You say that there should be few defaults for mappers. But what you 
> propose is exactly the opposite: You'd have a default, meaning that 
> you would need to create a name:originallanguage even if there is a 
> name present. I would bet that nobody does this. And if you don’t do 
> it like that, chaos will occur if you decide to display the name.
Wait...
I agree: even in the long term the majority of objects for sure will not 
have a name:originallanguage in addition to the only plain name tag. 
This is part of the incompleteness we have everywhere in osm.
I disagree, that this would lead to chaos for itself.

Imagine a text based application that could be read aloud by software. 
To do that properly names should be spoken with the pronunciation of the 
language they are from.
Let's consider a screenreader for browsers and a browser based 
application as an example. The output of "Dies ist der Times Square in 
New York" (this is the Times Square in New York) is simple to do, but a 
screen reader based only on German as a language would speak it out 
roughly like (not sure if I get it comparable for English speakers 
here): "Dees ist der Teames Square in Nu Johk", because nobody could 
know that Times Square and New York are names based on the English 
language. In a website, additional markup could ideally solve that 
(given that the screenreader supports english language as well in the 
users setup): <p lang="de">Dies ist der <span lang="en">Times 
Square</span> in <span lang="en">New York</span></p>.
But to generate markup like this the software has to know about the 
language.
Sure: this may be done by approximation based on the area in the world, 
and yes, developers have to use something like that for the usual case 
where the languages is still unknown, but in the text-to-speech area 
this would produce many wrong results by accident.
>
> In contrast, if you do it based on region, it would simplify things 
> much more:
>
> 1. You take the nodes/relation for Canada, add language=en.
> 2. You take the nodes/relation for Québec: language=fr
>
> Then everybody would just continue using name=British Columbia and 
> name=Montréal, and no problem. The multilingual renderer would then 
> show, in case the user wants to see French names, name=Montréal and 
> name:fr=Colombie-Britannique. If the user is English, he would show 
> name:en=Montreal and name:British Columbia.
I completely agree, as long as it's only about displaying. I completely 
agree that this is a valid fallback, but as I showed above that is not 
able to solve all problems.
Even for rendering I'm not sure if that's really an optimal solution for 
languages written right-to-left or downwards. Here you have to know at 
least this characteristics of the language to decide about label sizes 
and placements - not sure if that's really given in the unicode 
characters itself.
> Tell me where this is not easier than adding a redundant name:en or 
> name:fr for every town, bus stop and street in Canada. You would only 
> have to change the multilangual renderer so that it would display it 
> like that. This is no problem because I guess it is still in 
> development – It could be done relatively easy (from a non-developer 
> standpoint speaking).
Examples above.
And yes, it's easier to skip the native language as a separate tag. It 
will work for most cases; but it won't for many others.
We're not a map, we're a geo database, and languages are important for 
that as well, especially interesting for foreign languages. It is in 
fact interesting to see which pubs and restaurants in Germany are named 
by names from English/French/Spanish/Italian/... language. It's 
fascinating to see where e.g. pubs have Cymraeg (Welsh) or Gaelic names 
in their "native" areas and outside.
And these are examples that occur often.
> Plus, most of todays nodes only have a name=... tag, not a 
> name:xyz=... one. You would not need to change anything.
Sure. Software has to support that, and has to make a best guess, but 
it's only that: a best guess - sometimes it's wrong. Especially in 
multi-language parts of the world. To suggest English as a language in 
the hispanic cities, towns or suburbs of the united states (e.g. Santa 
Fe, New Mexico [1]) is error prone, I'm sure there are areas where you 
have two or more languages used roughly equally.

So:
- Yes: Software developers should support guessing the natural languages 
(where that's necessary)
- No: Mappers should NOT delete localized name tags even if these are 
equal to the local one out of the assumption of redundancy.
- No: Mappers should NOT be told to never add localized tags where only 
one single name tag exists.

regards
Peter

[1] 
http://www.openstreetmap.org/?lat=35.68022&lon=-105.94028&zoom=17&layers=M



More information about the talk mailing list