[openstreetmap/openstreetmap-website] Changeset tags exceed limit of 255 unicode codepoints (Issue #6350)
Tom Hughes
notifications at github.com
Mon Aug 25 11:59:12 UTC 2025
tomhughes left a comment (openstreetmap/openstreetmap-website#6350)
I rather think the real problem is the unicode sequence. The ";⚪ " actually consists of:
* U+003B SEMICOLON (encoded as `3b`)
* U+E0020 TAG SPACE (encoded as `f3 a0 80 a0`)
* U+E0020 TAG SPACE (encoded as `f3 a0 80 a0`)
* U+E0020 TAG SPACE (encoded as `f3 a0 80 a0`)
* U+E0020 TAG SPACE (encoded as `f3 a0 80 a0`)
* U+26AA MEDIUM WHITE CIRCLE (encoded as `e2 9a aa`)
* U+0020 SPACE (encoded as `20`)
which is 21 bytes but only 7 codepoints or 3 characters as the four TAG SPACE characters have a grapheme break property of EXTEND so they merge with the semicolon and the whole thing only counts as one character.
That, along with the MEDIUM WHITE CIRCLE using three bytes but counting as one character means that the whole thing is 268 bytes but only 250 characters or 254 codepoints depending on how you prefer to count.
A quick test suggests that ruby's string length counts codepoints hence getting to 254 as the total here.
--
Reply to this email directly or view it on GitHub:
https://github.com/openstreetmap/openstreetmap-website/issues/6350#issuecomment-3219986394
You are receiving this because you are subscribed to this thread.
Message ID: <openstreetmap/openstreetmap-website/issues/6350/3219986394 at github.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/rails-dev/attachments/20250825/2b27984b/attachment.htm>
More information about the rails-dev
mailing list