<blockquote>

<p>That's just ridiculous. How on earth are we supposed to programatically determine what constitutes a "copycat" account? Or should we just send them all for your personal review?</p>

</blockquote>

<p>That's an unhelpful comment.</p>

<blockquote>

<p>suppress fiddling with unicode to impersonate legit osm users</p>

</blockquote>

<p>We already support some measures to prevent copycat account names (specifically, you are no longer allowed a name which are case-insensitive duplicates, see <a href="https://github.com/openstreetmap/openstreetmap-website/blob/34d663f01af07033dfca697ad607cb473aa70e40/app/models/user.rb#L41-L42">https://github.com/openstreetmap/openstreetmap-website/blob/34d663f01af07033dfca697ad607cb473aa70e40/app/models/user.rb#L41-L42</a> . We can extend this idea to cover other unicode-based normalisation approaches, e.g. e-acute vs e-with-combining-acute, duplicated Unicode characters (see <a href="https://en.wikipedia.org/wiki/Duplicate_characters_in_Unicode">https://en.wikipedia.org/wiki/Duplicate_characters_in_Unicode</a>) and homoglyphs or near-homoglyphs (see <a href="https://en.wikipedia.org/wiki/Homoglyph">https://en.wikipedia.org/wiki/Homoglyph</a>).</p>

<p>The best approaches will be to work with whatever functions are available through PostgreSQL, e.g. <a href="https://www.postgresql.org/docs/current/static/unaccent.html">unaccent()</a>, or ruby, e.g. <a href="https://ruby-doc.org/stdlib-2.2.1/libdoc/unicode_normalize/rdoc/String.html">unicode_normalize</a> or if there are suitable rubygems. We shouldn't attempt to build our own unicode normalisation rules!</p>

<p>Finally, the best long-term solution will be to include the account id in the URL, so that even the most creative of duplicate display names are easily distinguishable as separate accounts.</p>


<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/openstreetmap/openstreetmap-website/issues/1419#issuecomment-276337054">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/ABWnLXfhbeLQsuDZTZ6tzuvtxeoUzlc-ks5rXxeXgaJpZM4LxQLj">mute the thread</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/ABWnLZGX40r_Mvo9eXgWpuJdh2R0j8A7ks5rXxeXgaJpZM4LxQLj.gif" width="1" /></p>

<div itemscope itemtype="http://schema.org/EmailMessage">

<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">

  <link itemprop="url" href="https://github.com/openstreetmap/openstreetmap-website/issues/1419#issuecomment-276337054"></link>

  <meta itemprop="name" content="View Issue"></meta>

</div>

<meta itemprop="description" content="View this Issue on GitHub"></meta>

</div>


<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/openstreetmap/openstreetmap-website","title":"openstreetmap/openstreetmap-website","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/openstreetmap/openstreetmap-website"}},"updates":{"snippets":[{"icon":"PERSON","message":"@gravitystorm in #1419: \u003e That's just ridiculous. How on earth are we supposed to programatically determine what constitutes a \"copycat\" account? Or should we just send them all for your personal review?\r\n\r\nThat's an unhelpful comment.\r\n\r\n\u003e suppress fiddling with unicode to impersonate legit osm users\r\n\r\nWe already support some measures to prevent copycat account names (specifically, you are no longer allowed a name which are case-insensitive duplicates, see https://github.com/openstreetmap/openstreetmap-website/blob/34d663f01af07033dfca697ad607cb473aa70e40/app/models/user.rb#L41-L42 . We can extend this idea to cover other unicode-based normalisation approaches, e.g. e-acute vs e-with-combining-acute, duplicated Unicode characters (see https://en.wikipedia.org/wiki/Duplicate_characters_in_Unicode) and homoglyphs or near-homoglyphs (see https://en.wikipedia.org/wiki/Homoglyph). \r\n\r\nThe best approaches will be to work with whatever functions are available through PostgreSQL, e.g. [unaccent()](https://www.postgresql.org/docs/current/static/unaccent.html), or ruby, e.g. [unicode_normalize](https://ruby-doc.org/stdlib-2.2.1/libdoc/unicode_normalize/rdoc/String.html) or if there are suitable rubygems. We shouldn't attempt to build our own unicode normalisation rules!\r\n\r\nFinally, the best long-term solution will be to include the account id in the URL, so that even the most creative of duplicate display names are easily distinguishable as separate accounts.\r\n\r\n\r\n\r\n"}],"action":{"name":"View Issue","url":"https://github.com/openstreetmap/openstreetmap-website/issues/1419#issuecomment-276337054"}}}</script>