[Tagging] Identifying language regions
Imre Samu
pella.samu at gmail.com
Tue Apr 24 19:00:26 UTC 2018
>your schema is neither simple nor usable for multilangue area
>what's the primaryOnTheGroundLang for Brussels ? or Fribourg ?
>f I understand you very well, ...
sorry,
maybe the "Heuristic" is a better word = "a practical method not guaranteed
to be optimal or perfect, but sufficient for the immediate goals. "
https://en.wikipedia.org/wiki/Heuristic
We can improve and we can invent a complex metadata - for most of the
extreme case, and maybe we can reach the ~99.5% - precision.
We can add some other rules
- detecting charset (Arabic, Latin, Greek, Cyrillic, Hebrew, Chinese
Traditional, Korean, Japanese , Thai , ... )
- separators + language orders
- we can add some extra regexp or other special rules (
https://en.wikipedia.org/wiki/Business_rules_engine )
separator , Language_orders, PossibleLanguages,
Charset_order ,example
Brussels, Q239 , " - ", 'fr nl' , fr nl de ,
"latin latin" ,"Rue du Marché aux Poulets - Kiekenmarkt"
Hong Kong, Q8646, " ", 'zh en' , zh en , "chinese
latin" ,"干諾道中 Connaught Road Central"
Sardinia, Q1462, "/", unknown , sc ca co lij it ,
,"Nùgoro/Nuoro"
South Tyrol,Q15124, " - ", unknown , it de lld ,
,"Urtijëi - St. Ulrich - Ortisei"
In South Tyrol / Sardegna (Sardinia) the language order is not
defined("unknown"). But we can add some QA check for detecting all
"name:*" tags exists.
Maybe in the new OSM data model - we can solve this problem.
2018-04-24 19:20 GMT+02:00 marc marc <marc_marc_irc at hotmail.com>:
> your schema is neither simple nor usable for multilangue area
> what's the primaryOnTheGroundLang for Brussels ? or Fribourg ?
> if I understand you very well, a guy need to travel the city and count
> how many NameOnTheGround is in fr and how many in nl and after he can
> create the metadata. woaw !
> and what if 2 langages have the same count ?
> because in Brussels all street signs are bilingual.
>
> a KISS schema for boundary look like
> language:fr=main or official or designated + language:nl=thesame
> or official_language:fr=yes + official_language:nl=yes
> or official_language=fr;nl
>
> and it somebody want to include a kind of ground stat or spoken
> language, it's maybe another chanllenge... and have no idea of what kind
> of source you 'll find for that.
>
> Le 24. 04. 18 à 18:56, Imre Samu a écrit :
> >> The main problem multilingual map effort is trying to solve is how to
> calculate the language of the "name" tag.
> >
> > As I understand - We need a "simple metadata" - about the "current
> > mapping rules" [ https://wiki.openstreetmap.org/wiki/Multilingual_names
> ]
> > So, We can use this for:
> > - Multilingual Maps
> > - OSM Editors - checking/validating character sets, extreme characters
> > - "Localization of name suggestion":
> > https://github.com/osmlab/name-suggestion-index/issues/11
> > - other QA tools ( osmcha?)
> >
> > My biggest problem is the "on the ground" rule:
> > / "The "on the ground" rule remains the method of determining the
> > appropriate value for the name tag. "/
> > https://wiki.osmfoundation.org/wiki/Working_Group_
> Minutes/DWG_2014-06-05_Special_Crimea
> >
> > But sometimes reusing this metadata for QA rules is not so simple :
> > - " Béla Bartók square in Paris. The “ó” is not valid in French." see
> > more: https://wiesmann.codiferes.net/wordpress/?p=15187
> >
> >
> > *My pragmatic solution*
> >
> > in my mind, this is 2 separated problem:
> > - inventing a good metadata for every case ( see
> > https://wiki.openstreetmap.org/wiki/Multilingual_names for example:
> > Hong Kong )
> > - storing the metadata [ as an OSM tag; in the OsmWiki ; in the
> > Github(https://github.com/osmlab/....)
> >
> >
> > First - We can create a simple metadata - with the "Wikidata"-keys on
> > the OSM admin areas
> >
> > like a simple Wikidata(OSM admin-area) - primary/secondary language
> > code table
> >
> > name_en, Wikidata, primaryOnTheGroundLang,
> > secondaryOnTheGroundLang
> > Aruba, Q21203, nl ,
> > Afghanistan,Q889, ps
> > Angola, Q916, pt
> > Anguilla,Q25228, en
> > Albania,Q222, sq
> > Åland Islands,Q5689, sv
> > ..
> > Crimea, Q7835, ru, uk
> > Russia, Q159, ru,
> > Ukraine, Q212, uk,
> > ...
> >
> > - If some area overlapping ( "Crimea") - the smaller area has a higher
> > priority
> > - We can merge this metadata with the OSM - and after we have polygons.
> >
> >
> >
> >
> >
> >
> > 2018-04-24 15:58 GMT+02:00 Yuri Astrakhan <yuriastrakhan at gmail.com
> > <mailto:yuriastrakhan at gmail.com>>:
> >
> > The main problem multilingual map effort is trying to solve is how
> > to calculate the language of the "name" tag. Without it, name tag
> > becomes nearly useless. For example:
> >
> > * An Italian user viewing a feature in China with two tags: "name"
> > and "name:fr". In this case, "name:fr" tag is preferred because
> > "name" is likely to be in Chinese - not great for an Italian speaker.
> > * Same tags, but the feature is in Italy -- now "name" tag is the
> > better choice because the name is actually in the same language as
> > the reader.
> >
> > Without knowing the language of the "name" tag, we cannot use it as
> > part of the "script matching" - give preference to languages that
> > use the same script as the reader, even if the language is different.
> >
> > On Tue, Apr 24, 2018 at 12:29 PM, Andy Townsend <ajt1047 at gmail.com
> > <mailto:ajt1047 at gmail.com>> wrote:
> >
> > On 24/04/2018 09:11, Rory McCann wrote:
> >
> > Ireland has 2 official languges (Irish first & then
> > English), but only ~2% of the population speak Irish daily.
> > There are some legal defined regions of Ireland which are
> > supposed to be "Irish speaking areas", but even there Irish
> > is a minority language. So how should that be tagged? (Some
> > day we'll get around to mapping the Gaeltachtaí)
> >
> >
> > Ireland's pretty much a "best case" for this as it does have
> > defined language regions for Irish. Most places don't.
> >
> >
> > If you want to know the language in a multi-lingual area,
> > why not look at the name, and name:XX tags. If the name
> > value is the same as a name:Z then Z is the language.
> >
> >
> > That won't always work. You can probably guess the example I'm
> > going to pick next - https://www.openstreetmap.org/node/52241235
> > <https://www.openstreetmap.org/node/52241235> :)
> >
> > For those unaware, the story there is summarised at
> > https://en.wikipedia.org/wiki/An_Daingean#Name
> > <https://en.wikipedia.org/wiki/An_Daingean#Name> . It's a while
> > since I've been there; not sure how much of a "cause celebre" it
> > is currently. I've certainly heard people on RTE refer to it as
> > "Dingle / An Daingean" (that's the English name and the commonly
> > used Irish name but not the official Irish name...).
> >
> > Best Regards,
> > Andy
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Tagging mailing list
> > Tagging at openstreetmap.org <mailto:Tagging at openstreetmap.org>
> > https://lists.openstreetmap.org/listinfo/tagging
> > <https://lists.openstreetmap.org/listinfo/tagging>
> >
> >
> >
> > _______________________________________________
> > Tagging mailing list
> > Tagging at openstreetmap.org <mailto:Tagging at openstreetmap.org>
> > https://lists.openstreetmap.org/listinfo/tagging
> > <https://lists.openstreetmap.org/listinfo/tagging>
> >
> >
> >
> >
> > _______________________________________________
> > Tagging mailing list
> > Tagging at openstreetmap.org
> > https://lists.openstreetmap.org/listinfo/tagging
> >
>
> _______________________________________________
> Tagging mailing list
> Tagging at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tagging/attachments/20180424/287cd484/attachment-0001.html>
More information about the Tagging
mailing list