[Tagging] Nonbreakable spaces in name tags

Simon Poole simon at poole.ch
Wed Jan 31 18:33:10 UTC 2018


IMHO we should in general treat all unicode space variants as a nomal
ASCII space for processing and comparision purposes and leave it at that.

And we don't have the issues just in name tags, see

SKIP :
{
  "\r"
| "\n"
| " "
| "\t"
| "\u200A"
| "\u2009"
| "\u00A0"
| "\u2008"
| "\u2002"
| "\u2007"
| "\u3000"
| "\u2003"
| "\u2006"
| "\u2005"
| "\u2004"
}

from my OH parser.

Simon


Am 31.01.2018 um 16:25 schrieb Matej Lieskovský:
> So... can we reach some conclusion?
>
> I have a particular situation I need to resolve - some streets consist
> of ways that (among other, meaningful differences) vary in their usage
> of non-breakable spaces. Here are the possible solutions:
>
> 1) Start removing nbsp from local data
> 2) In case of conflict, prefer the variant without nbsp
> 3) In case of conflict, choose the more common variant
> 4) In case of conflict, prefer the variant with (correctly placed) nbsp
> 5) Start adding nbsp to local data
> 6) Leave things as they are
>
> To be perfectly honest, unless we can agree on whether nbsp should be
> encouraged or removed, I will use option 4. Option 6 (status quo) is
> pretty much the worst of both worlds, 5 is undeniably adding nbsp to
> the data (and too much work for now), and an eventual conversion from
> anything to 1 is trivial (which does not work for converting from 2 or
> 3 to 5). Since option 4 at least makes entire streets have the same
> name without loss of data or adding nbsp to streets that are ok so
> far, I consider it to be the best compromise in case of no consensus.
>
> Matej Lieskovský
>
> PS: I am starting to suspect that we might need a wiki page concerning
> Unicode usage in general (nbsp, soft hyphens, roman numerals,
> normalisation...). The link below does seem a little underwhelming:
> https://wiki.openstreetmap.org/wiki/Any_tags_you_like#Characters
>
> On 27 January 2018 at 01:50, Johnparis <okosm at johnfreed.com> wrote:
>> HTML has   for non-breakable spaces (Unicode U+00A0).
>>
>> HTML has ­ for soft hyphens (Unicode U+00AD).
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Fri, 26 Jan 2018 23:04:32 +0100
>> From: Richard <ricoz.osm at gmail.com>
>> To: "Tag discussion, strategy and related tools"
>>         <tagging at openstreetmap.org>
>> Subject: Re: [Tagging] Nonbreakable spaces in name tags
>> Message-ID: <20180126220432.GA10615 at rz.localhost.localdomain>
>> Content-Type: text/plain; charset=iso-8859-1
>>
>> On Fri, Jan 26, 2018 at 03:48:42PM +0100, Matej Lieskovský wrote:
>>> Greetings!
>>>
>>> Several Slavic languages have rather formal rules about line breaks.
>> the problem is much broader, sooner or later OSM rendering will hit word
>> splitting.
>>
>>> PS: The rules are formal enough that there exists a 1997 program
>>> "Vlna" ("Tilde"), that can add nonbreakable spaces to TeX source files
>>> and is commonly used for important documents.
>> probably not all OSM languaes have such tools and even if they have it can
>> be tricky to determine which language rules to apply.
>>
>> I would think..
>> * if someone wants to use nonbreakable spaces he should be allowed to do
>>   so and tools should tolerate it (not necessarilly understand but not
>>   break)
>> * if someone wants to use explicit word-split marks/soft-hyphens
>>   this should be somehow allowed too.
>>
>> Otherwise the software should try to do its best and apply heuristics to
>> avoid
>> splitting lines in wrong places.
>> Not splitting 1000 034 should be obvious, roman numbers as well. Prefer not
>> splitting around "lonely" characters.
>> The rendering software can also compare texts with name tags and prefer not
>> to split names at all.
>>
>> Richard
>>
>>
>>
>>
>> _______________________________________________
>> Tagging mailing list
>> Tagging at openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/tagging
>>
> _______________________________________________
> Tagging mailing list
> Tagging at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstreetmap.org/pipermail/tagging/attachments/20180131/dc20b69c/attachment.sig>


More information about the Tagging mailing list