[Tagging] Nonbreakable spaces in name tags

Wed Jan 31 19:08:56 UTC 2018

Marc, I think that spelling rules is part of the specific language's
rules/naming, and should be allowed in tags. If some tool does not
normalize unicode for searching, the tool should be fixed (shouldn't be too
hard in most cases). Most search engines do these kinds of normalizations
before indexing anyway. Fixing Unicode normalization is far simpler than
building a grammar engine that knows how to break words in every language,
and maintaining a huge set of exceptions for each (there are always
exceptions in these things), and attaching this engine to every rendering
system.

On Wed, Jan 31, 2018 at 1:49 PM, marc marc <marc_marc_irc at hotmail.com>
wrote:

> I remain convinced that spelling rules have no place in osm tags
> even if it would be convenient.
> If they are to be added, the primary tools should first be asked to
> manage them before considering their use. otherwise the slightest search
> on a street name can fail, it's worse than having an incorrect return to
> the line.
>
> Le 31. 01. 18 à 19:33, Simon Poole a écrit :
> > IMHO we should in general treat all unicode space variants as a nomal
> > ASCII space for processing and comparision purposes and leave it at that.
> >
> > And we don't have the issues just in name tags, see
> >
> > SKIP :
> > {
> >    "\r"
> > | "\n"
> > | " "
> > | "\t"
> > | "\u200A"
> > | "\u2009"
> > | "\u00A0"
> > | "\u2008"
> > | "\u2002"
> > | "\u2007"
> > | "\u3000"
> > | "\u2003"
> > | "\u2006"
> > | "\u2005"
> > | "\u2004"
> > }
> >
> > from my OH parser.
> >
> > Simon
> >
> >
> > Am 31.01.2018 um 16:25 schrieb Matej Lieskovský:
> >> So... can we reach some conclusion?
> >>
> >> I have a particular situation I need to resolve - some streets consist
> >> of ways that (among other, meaningful differences) vary in their usage
> >> of non-breakable spaces. Here are the possible solutions:
> >>
> >> 1) Start removing nbsp from local data
> >> 2) In case of conflict, prefer the variant without nbsp
> >> 3) In case of conflict, choose the more common variant
> >> 4) In case of conflict, prefer the variant with (correctly placed) nbsp
> >> 5) Start adding nbsp to local data
> >> 6) Leave things as they are
> >>
> >> To be perfectly honest, unless we can agree on whether nbsp should be
> >> encouraged or removed, I will use option 4. Option 6 (status quo) is
> >> pretty much the worst of both worlds, 5 is undeniably adding nbsp to
> >> the data (and too much work for now), and an eventual conversion from
> >> anything to 1 is trivial (which does not work for converting from 2 or
> >> 3 to 5). Since option 4 at least makes entire streets have the same
> >> name without loss of data or adding nbsp to streets that are ok so
> >> far, I consider it to be the best compromise in case of no consensus.
> >>
> >> Matej Lieskovský
> >>
> >> PS: I am starting to suspect that we might need a wiki page concerning
> >> Unicode usage in general (nbsp, soft hyphens, roman numerals,
> >> normalisation...). The link below does seem a little underwhelming:
> >> https://wiki.openstreetmap.org/wiki/Any_tags_you_like#Characters
> >>
> >> On 27 January 2018 at 01:50, Johnparis <okosm at johnfreed.com> wrote:
> >>> HTML has   for non-breakable spaces (Unicode U+00A0).
> >>>
> >>> HTML has  for soft hyphens (Unicode U+00AD).
> >>>
> >>> ------------------------------
> >>>
> >>> Message: 2
> >>> Date: Fri, 26 Jan 2018 23:04:32 +0100
> >>> From: Richard <ricoz.osm at gmail.com>
> >>> To: "Tag discussion, strategy and related tools"
> >>>          <tagging at openstreetmap.org>
> >>> Subject: Re: [Tagging] Nonbreakable spaces in name tags
> >>> Message-ID: <20180126220432.GA10615 at rz.localhost.localdomain>
> >>> Content-Type: text/plain; charset=iso-8859-1
> >>>
> >>> On Fri, Jan 26, 2018 at 03:48:42PM +0100, Matej Lieskovský wrote:
> >>>> Greetings!
> >>>>
> >>>> Several Slavic languages have rather formal rules about line breaks.
> >>> the problem is much broader, sooner or later OSM rendering will hit
> word
> >>> splitting.
> >>>
> >>>> PS: The rules are formal enough that there exists a 1997 program
> >>>> "Vlna" ("Tilde"), that can add nonbreakable spaces to TeX source files
> >>>> and is commonly used for important documents.
> >>> probably not all OSM languaes have such tools and even if they have it
> can
> >>> be tricky to determine which language rules to apply.
> >>>
> >>> I would think..
> >>> * if someone wants to use nonbreakable spaces he should be allowed to
> do
> >>>    so and tools should tolerate it (not necessarilly understand but not
> >>>    break)
> >>> * if someone wants to use explicit word-split marks/soft-hyphens
> >>>    this should be somehow allowed too.
> >>>
> >>> Otherwise the software should try to do its best and apply heuristics
> to
> >>> avoid
> >>> splitting lines in wrong places.
> >>> Not splitting 1000 034 should be obvious, roman numbers as well.
> Prefer not
> >>> splitting around "lonely" characters.
> >>> The rendering software can also compare texts with name tags and
> prefer not
> >>> to split names at all.
> >>>
> >>> Richard
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Tagging mailing list
> >>> Tagging at openstreetmap.org
> >>> https://lists.openstreetmap.org/listinfo/tagging
> >>>
> >> _______________________________________________
> >> Tagging mailing list
> >> Tagging at openstreetmap.org
> >> https://lists.openstreetmap.org/listinfo/tagging
> >
> >
> >
> >
> > _______________________________________________
> > Tagging mailing list
> > Tagging at openstreetmap.org
> > https://lists.openstreetmap.org/listinfo/tagging
> >
>
> _______________________________________________
> Tagging mailing list
> Tagging at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tagging/attachments/20180131/7de13305/attachment.html>