[Tagging] Nonbreakable spaces in name tags

marc marc marc_marc_irc at hotmail.com
Wed Jan 31 18:49:59 UTC 2018


I remain convinced that spelling rules have no place in osm tags
even if it would be convenient.
If they are to be added, the primary tools should first be asked to 
manage them before considering their use. otherwise the slightest search 
on a street name can fail, it's worse than having an incorrect return to 
the line.

Le 31. 01. 18 à 19:33, Simon Poole a écrit :
> IMHO we should in general treat all unicode space variants as a nomal
> ASCII space for processing and comparision purposes and leave it at that.
> 
> And we don't have the issues just in name tags, see
> 
> SKIP :
> {
>    "\r"
> | "\n"
> | " "
> | "\t"
> | "\u200A"
> | "\u2009"
> | "\u00A0"
> | "\u2008"
> | "\u2002"
> | "\u2007"
> | "\u3000"
> | "\u2003"
> | "\u2006"
> | "\u2005"
> | "\u2004"
> }
> 
> from my OH parser.
> 
> Simon
> 
> 
> Am 31.01.2018 um 16:25 schrieb Matej Lieskovský:
>> So... can we reach some conclusion?
>>
>> I have a particular situation I need to resolve - some streets consist
>> of ways that (among other, meaningful differences) vary in their usage
>> of non-breakable spaces. Here are the possible solutions:
>>
>> 1) Start removing nbsp from local data
>> 2) In case of conflict, prefer the variant without nbsp
>> 3) In case of conflict, choose the more common variant
>> 4) In case of conflict, prefer the variant with (correctly placed) nbsp
>> 5) Start adding nbsp to local data
>> 6) Leave things as they are
>>
>> To be perfectly honest, unless we can agree on whether nbsp should be
>> encouraged or removed, I will use option 4. Option 6 (status quo) is
>> pretty much the worst of both worlds, 5 is undeniably adding nbsp to
>> the data (and too much work for now), and an eventual conversion from
>> anything to 1 is trivial (which does not work for converting from 2 or
>> 3 to 5). Since option 4 at least makes entire streets have the same
>> name without loss of data or adding nbsp to streets that are ok so
>> far, I consider it to be the best compromise in case of no consensus.
>>
>> Matej Lieskovský
>>
>> PS: I am starting to suspect that we might need a wiki page concerning
>> Unicode usage in general (nbsp, soft hyphens, roman numerals,
>> normalisation...). The link below does seem a little underwhelming:
>> https://wiki.openstreetmap.org/wiki/Any_tags_you_like#Characters
>>
>> On 27 January 2018 at 01:50, Johnparis <okosm at johnfreed.com> wrote:
>>> HTML has   for non-breakable spaces (Unicode U+00A0).
>>>
>>> HTML has ­ for soft hyphens (Unicode U+00AD).
>>>
>>> ------------------------------
>>>
>>> Message: 2
>>> Date: Fri, 26 Jan 2018 23:04:32 +0100
>>> From: Richard <ricoz.osm at gmail.com>
>>> To: "Tag discussion, strategy and related tools"
>>>          <tagging at openstreetmap.org>
>>> Subject: Re: [Tagging] Nonbreakable spaces in name tags
>>> Message-ID: <20180126220432.GA10615 at rz.localhost.localdomain>
>>> Content-Type: text/plain; charset=iso-8859-1
>>>
>>> On Fri, Jan 26, 2018 at 03:48:42PM +0100, Matej Lieskovský wrote:
>>>> Greetings!
>>>>
>>>> Several Slavic languages have rather formal rules about line breaks.
>>> the problem is much broader, sooner or later OSM rendering will hit word
>>> splitting.
>>>
>>>> PS: The rules are formal enough that there exists a 1997 program
>>>> "Vlna" ("Tilde"), that can add nonbreakable spaces to TeX source files
>>>> and is commonly used for important documents.
>>> probably not all OSM languaes have such tools and even if they have it can
>>> be tricky to determine which language rules to apply.
>>>
>>> I would think..
>>> * if someone wants to use nonbreakable spaces he should be allowed to do
>>>    so and tools should tolerate it (not necessarilly understand but not
>>>    break)
>>> * if someone wants to use explicit word-split marks/soft-hyphens
>>>    this should be somehow allowed too.
>>>
>>> Otherwise the software should try to do its best and apply heuristics to
>>> avoid
>>> splitting lines in wrong places.
>>> Not splitting 1000 034 should be obvious, roman numbers as well. Prefer not
>>> splitting around "lonely" characters.
>>> The rendering software can also compare texts with name tags and prefer not
>>> to split names at all.
>>>
>>> Richard
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Tagging mailing list
>>> Tagging at openstreetmap.org
>>> https://lists.openstreetmap.org/listinfo/tagging
>>>
>> _______________________________________________
>> Tagging mailing list
>> Tagging at openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/tagging
> 
> 
> 
> 
> _______________________________________________
> Tagging mailing list
> Tagging at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging
> 



More information about the Tagging mailing list