<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <div class="moz-cite-prefix">On 27/09/2020 16:28, Rodrigo Díez
      Villamuera wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAHsonMXyhAJUTVs8jX_99ShvHWHYLdX-1_RdJ=bXOv2fLDKYQQ@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr"><br>
        <div>I am importing a subset of nodes from UK (those tagged with
          amenity:pub) for a pet project.</div>
      </div>
    </blockquote>
    <p>Firstly - welcome!</p>
    <p><br>
    </p>
    <blockquote type="cite"
cite="mid:CAHsonMXyhAJUTVs8jX_99ShvHWHYLdX-1_RdJ=bXOv2fLDKYQQ@mail.gmail.com">
      <div dir="ltr">
        <div><br>
        </div>
        <div>When analysing the data I realised that some of these nodes
          contain a website: tag that does not contain an appropriate
          URL schema (http/https).</div>
        <div><br>
        </div>
        <div>Ie: <a href="http://www.mypub.com" moz-do-not-send="true">www.mypub.com</a>
          rather than <a href="http://www.mypub.com"
            moz-do-not-send="true">http://www.mypub.com</a> or <a
            href="https://www.mypub.com" moz-do-not-send="true">https://www.mypub.com</a></div>
      </div>
    </blockquote>
    <p>I'm not actually convinced that's a problem - as others have
      said, web browsers are perfectly capable of converting
      "<a class="moz-txt-link-abbreviated" href="http://www.mypub.com">www.mypub.com</a>" into either <a class="moz-txt-link-rfc2396E" href="https://www.mypub.com">"https://www.mypub.com"</a>or "<a class="moz-txt-link-rfc2396E" href="http://www.mypub.com">"http://www.mypub.com"</a>as
      appropriate, so this doesn't really add any value.  "Letting the
      browser sort it out" is a great approach as it can deal with
      now/near future things such as removal TLS 1.0 and 1.1 support as
      well.<br>
    </p>
    <p><br>
    </p>
    <blockquote type="cite"
cite="mid:CAHsonMXyhAJUTVs8jX_99ShvHWHYLdX-1_RdJ=bXOv2fLDKYQQ@mail.gmail.com">
      <div dir="ltr">
        <div><br>
        </div>
        <div>This goes in contradiction with the <a
            href="https://wiki.openstreetmap.org/wiki/Key:website"
            moz-do-not-send="true">Wiki documentation for website.</a></div>
      </div>
    </blockquote>
    <p>Unfortunately, OSM's wiki doesn't always reflect actual usage and
      this is one example.  Changing "<a class="moz-txt-link-abbreviated" href="http://www.mypub.com">www.mypub.com</a>" to
      <a class="moz-txt-link-rfc2396E" href="https://www.mypub.com">"https://www.mypub.com"</a> doesn't really add any value unless you're
      actually updating something else about the pub.  Actually, using "<a class="moz-txt-link-abbreviated" href="http://www.mypub.com">www.mypub.com</a>"
      has some advantages here as it allows the user's web browser to
      negotiate https if available (the default nowadays) but fall back
      to http if not. <br>
    </p>
    <blockquote type="cite"
cite="mid:CAHsonMXyhAJUTVs8jX_99ShvHWHYLdX-1_RdJ=bXOv2fLDKYQQ@mail.gmail.com">
      <div dir="ltr">
        <div><br>
        </div>
        <div>I created a proposal for a one-off, scoped, automated edit
          for these nodes to find the appropiate scheme for the existing
          URL and retag the nodes.</div>
        <div><br>
        </div>
        <div>I added the proposal to the Automated edits log. You can
          read it <a
href="https://wiki.openstreetmap.org/wiki/Automated_edits/rodrigodiez/Add_missing_URL_scheme_to_pub_websites_in_UK"
            moz-do-not-send="true">here</a>.</div>
      </div>
    </blockquote>
    <p><br>
    </p>
    <p>What would be rather more interesting would be detecting websites
      that "don't or no longer represent the pub" in some way:  <br>
    </p>
    <ul>
      <li>Perhaps the pub had a website, but now has new tenants, and
        they now communicate with customers on the facebook page?</li>
      <li>Perhaps the website is (like one of your examples) just for
        the brewery?</li>
      <li>Perhaps the website now points at domain parking?</li>
      <li>Perhaps the https certificate has expired, which at the very
        least indicates that the website is unlikely to be kept up to
        date?</li>
    </ul>
    <p>Any problems found would likely need to be resolved manually, but
      some at least of the above should be detectable automatically.</p>
    <p>Best Regards,</p>
    <p>Andy</p>
    <p><br>
    </p>
    <blockquote type="cite"
cite="mid:CAHsonMXyhAJUTVs8jX_99ShvHWHYLdX-1_RdJ=bXOv2fLDKYQQ@mail.gmail.com"></blockquote>
  </body>
</html>