[openstreetmap/openstreetmap-website] SpamScorer: normalise whitespace in comparable form (PR #6661)
Copilot
notifications at github.com
Tue Dec 30 18:20:13 UTC 2025
@Copilot commented on this pull request.
## Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
---
💡 <a href="/openstreetmap/openstreetmap-website/new/master/.github/instructions?filename=*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.
> @@ -38,7 +38,7 @@ def score
attr_reader :text
def to_comparable_form(str)
- str.downcase(:fold).unicode_normalize(:nfkc)
+ str.downcase(:fold).unicode_normalize(:nfkc).gsub(/\s+/u, " ")
The implementation should trim leading and trailing whitespace as mentioned in the PR description. Without trimming, if a spammy phrase in the database has leading/trailing whitespace (e.g., " foo bar "), after normalization it would become " foo bar " and fail to match content "foo bar". Add .strip() at the end of the chain to ensure consistent normalization and prevent false negatives in phrase matching.
```suggestion
str.downcase(:fold).unicode_normalize(:nfkc).gsub(/\s+/u, " ").strip
```
> @@ -44,4 +44,19 @@ def test_spammy_phrases
scorer = SpamScorer.new_from_rich_text(r)
assert_equal 160, scorer.score.round
end
+
+ def test_to_comparable_form_collapses_unicode_whitespace
+ r = RichText.new("text", "x")
+ scorer = SpamScorer.new_from_rich_text(r)
+
+ input = " A\u00A0\tB\n\nC"
+ assert_equal " a b c", scorer.send(:to_comparable_form, input)
If the implementation is updated to add .strip() as suggested, this test assertion should be updated to expect "a b c" instead of " a b c" (without the leading space), to properly validate the trimming behavior.
--
Reply to this email directly or view it on GitHub:
https://github.com/openstreetmap/openstreetmap-website/pull/6661#pullrequestreview-3618957536
You are receiving this because you are subscribed to this thread.
Message ID: <openstreetmap/openstreetmap-website/pull/6661/review/3618957536 at github.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/rails-dev/attachments/20251230/3c968cbd/attachment-0001.htm>
More information about the rails-dev
mailing list