[Tagging] Identifying language regions

Imre Samu pella.samu at gmail.com
Tue Apr 24 19:00:26 UTC 2018


>your schema is neither simple nor usable for multilangue area
>what's the primaryOnTheGroundLang for Brussels ? or Fribourg ?
>f I understand you very well,  ...

sorry,
maybe the "Heuristic" is a better word = "a practical method not guaranteed
to be optimal or perfect, but sufficient for the immediate goals. "
https://en.wikipedia.org/wiki/Heuristic

We can improve and we can invent a complex metadata - for most of the
extreme case, and maybe we can reach the ~99.5% - precision.

We can  add some other rules
- detecting charset (Arabic, Latin, Greek, Cyrillic, Hebrew, Chinese
Traditional, Korean, Japanese , Thai , ... )
- separators  + language orders
- we can add some extra regexp or other special rules (
https://en.wikipedia.org/wiki/Business_rules_engine )


                    separator  ,   Language_orders, PossibleLanguages,
Charset_order    ,example
Brussels,  Q239 ,    " - ",        'fr nl'        , fr nl de         ,
"latin latin"     ,"Rue du Marché aux Poulets - Kiekenmarkt"
Hong Kong, Q8646,    " ",          'zh en'        , zh en            , "chinese
latin"   ,"干諾道中 Connaught Road Central"
Sardinia,  Q1462,    "/",          unknown       , sc ca co lij it  ,
             ,"Nùgoro/Nuoro"
South Tyrol,Q15124,  " - ",       unknown      , it de lld       ,
       ,"Urtijëi - St. Ulrich - Ortisei"

In South Tyrol / Sardegna (Sardinia) the language order is not
defined("unknown").  But we can add some QA check for detecting all
"name:*"  tags exists.

Maybe in the new OSM data model - we can solve this problem.




2018-04-24 19:20 GMT+02:00 marc marc <marc_marc_irc at hotmail.com>:

> your schema is neither simple nor usable for multilangue area
> what's the primaryOnTheGroundLang for Brussels ? or Fribourg ?
> if I understand you very well, a guy need to travel the city and count
> how many NameOnTheGround is in fr and how many in nl and after he can
> create the metadata. woaw !
> and what if 2 langages have the same count ?
> because in Brussels all street signs are bilingual.
>
> a KISS schema for boundary look like
> language:fr=main or official or designated + language:nl=thesame
> or official_language:fr=yes + official_language:nl=yes
> or official_language=fr;nl
>
> and it somebody want to include a kind of ground stat or spoken
> language, it's maybe another chanllenge... and have no idea of what kind
> of source you 'll find for that.
>
> Le 24. 04. 18 à 18:56, Imre Samu a écrit :
> >> The main problem multilingual map effort is trying to solve is how to
> calculate the language of the "name" tag.
> >
> > As I understand - We need a "simple metadata" - about the "current
> > mapping rules"  [ https://wiki.openstreetmap.org/wiki/Multilingual_names
> ]
> > So, We can use this for:
> > -  Multilingual Maps
> > -  OSM Editors  - checking/validating  character sets, extreme characters
> > -  "Localization of name suggestion":
> > https://github.com/osmlab/name-suggestion-index/issues/11
> > -  other QA tools  ( osmcha?)
> >
> > My biggest problem is the "on the ground" rule:
> > /  "The "on the ground" rule remains the method of determining the
> > appropriate value for the name tag. "/
> > https://wiki.osmfoundation.org/wiki/Working_Group_
> Minutes/DWG_2014-06-05_Special_Crimea
> >
> > But sometimes reusing this metadata for QA rules is not so simple :
> > - " Béla Bartók square in Paris. The “ó” is not valid in French."    see
> > more: https://wiesmann.codiferes.net/wordpress/?p=15187
> >
> >
> > *My  pragmatic solution*
> >
> > in my mind, this is 2 separated problem:
> > - inventing a good metadata for every case ( see
> > https://wiki.openstreetmap.org/wiki/Multilingual_names  for example:
> > Hong Kong  )
> > - storing the metadata  [ as an OSM tag;   in the OsmWiki  ; in the
> > Github(https://github.com/osmlab/....)
> >
> >
> > First - We can create a simple metadata -   with the  "Wikidata"-keys on
> > the OSM admin areas
> >
> > like  a simple    Wikidata(OSM admin-area) -  primary/secondary language
> > code table
> >
> > name_en,        Wikidata,  primaryOnTheGroundLang,
> > secondaryOnTheGroundLang
> > Aruba,        Q21203,    nl         ,
> > Afghanistan,Q889,      ps
> > Angola,        Q916,      pt
> > Anguilla,Q25228,    en
> > Albania,Q222,      sq
> > Åland Islands,Q5689,     sv
> > ..
> > Crimea,         Q7835,     ru,                        uk
> > Russia,         Q159,      ru,
> > Ukraine,        Q212,      uk,
> > ...
> >
> > - If some area overlapping (  "Crimea") - the smaller area has a higher
> > priority
> > - We can merge this metadata with the OSM  - and after we have polygons.
> >
> >
> >
> >
> >
> >
> > 2018-04-24 15:58 GMT+02:00 Yuri Astrakhan <yuriastrakhan at gmail.com
> > <mailto:yuriastrakhan at gmail.com>>:
> >
> >     The main problem multilingual map effort is trying to solve is how
> >     to calculate the language of the "name" tag.  Without it, name tag
> >     becomes nearly useless.  For example:
> >
> >     * An Italian user viewing a feature in China with two tags: "name"
> >     and "name:fr".   In this case, "name:fr" tag is preferred because
> >     "name" is likely to be in Chinese - not great for an Italian speaker.
> >     * Same tags, but the feature is in Italy -- now "name" tag is the
> >     better choice because the name is actually in the same language as
> >     the reader.
> >
> >     Without knowing the language of the "name" tag, we cannot use it as
> >     part of the "script matching" - give preference to languages that
> >     use the same script as the reader, even if the language is different.
> >
> >     On Tue, Apr 24, 2018 at 12:29 PM, Andy Townsend <ajt1047 at gmail.com
> >     <mailto:ajt1047 at gmail.com>> wrote:
> >
> >         On 24/04/2018 09:11, Rory McCann wrote:
> >
> >             Ireland has 2 official languges (Irish first & then
> >             English), but only ~2% of the population speak Irish daily.
> >             There are some legal defined regions of Ireland which are
> >             supposed to be "Irish speaking areas", but even there Irish
> >             is a minority language. So how should that be tagged? (Some
> >             day we'll get around to mapping the Gaeltachtaí)
> >
> >
> >         Ireland's pretty much a "best case" for this as it does have
> >         defined language regions for Irish.  Most places don't.
> >
> >
> >             If you want to know the language in a multi-lingual area,
> >             why not look at the name, and name:XX tags. If the name
> >             value is the same as a name:Z then Z is the language.
> >
> >
> >         That won't always work.  You can probably guess the example I'm
> >         going to pick next - https://www.openstreetmap.org/node/52241235
> >         <https://www.openstreetmap.org/node/52241235> :)
> >
> >         For those unaware, the story there is summarised at
> >         https://en.wikipedia.org/wiki/An_Daingean#Name
> >         <https://en.wikipedia.org/wiki/An_Daingean#Name> .  It's a while
> >         since I've been there; not sure how much of a "cause celebre" it
> >         is currently.  I've certainly heard people on RTE refer to it as
> >         "Dingle / An Daingean" (that's the English name and the commonly
> >         used Irish name but not the official Irish name...).
> >
> >         Best Regards,
> >         Andy
> >
> >
> >
> >
> >
> >
> >         _______________________________________________
> >         Tagging mailing list
> >         Tagging at openstreetmap.org <mailto:Tagging at openstreetmap.org>
> >         https://lists.openstreetmap.org/listinfo/tagging
> >         <https://lists.openstreetmap.org/listinfo/tagging>
> >
> >
> >
> >     _______________________________________________
> >     Tagging mailing list
> >     Tagging at openstreetmap.org <mailto:Tagging at openstreetmap.org>
> >     https://lists.openstreetmap.org/listinfo/tagging
> >     <https://lists.openstreetmap.org/listinfo/tagging>
> >
> >
> >
> >
> > _______________________________________________
> > Tagging mailing list
> > Tagging at openstreetmap.org
> > https://lists.openstreetmap.org/listinfo/tagging
> >
>
> _______________________________________________
> Tagging mailing list
> Tagging at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tagging/attachments/20180424/287cd484/attachment-0001.html>


More information about the Tagging mailing list