[Talk-GB] Hello world and automated change proposal: Add missing URL scheme on UK's Pubs websites

Andy Townsend ajt1047 at gmail.com
Mon Sep 28 11:21:01 UTC 2020


On 27/09/2020 16:28, Rodrigo Díez Villamuera wrote:
>
> I am importing a subset of nodes from UK (those tagged with 
> amenity:pub) for a pet project.

Firstly - welcome!


>
> When analysing the data I realised that some of these nodes contain a 
> website: tag that does not contain an appropriate URL schema (http/https).
>
> Ie: www.mypub.com <http://www.mypub.com> rather than 
> http://www.mypub.com <http://www.mypub.com> or https://www.mypub.com 
> <https://www.mypub.com>

I'm not actually convinced that's a problem - as others have said, web 
browsers are perfectly capable of converting "www.mypub.com" into either 
"https://www.mypub.com"or ""http://www.mypub.com"as appropriate, so this 
doesn't really add any value.  "Letting the browser sort it out" is a 
great approach as it can deal with now/near future things such as 
removal TLS 1.0 and 1.1 support as well.


>
> This goes in contradiction with the Wiki documentation for website. 
> <https://wiki.openstreetmap.org/wiki/Key:website>

Unfortunately, OSM's wiki doesn't always reflect actual usage and this 
is one example.  Changing "www.mypub.com" to "https://www.mypub.com" 
doesn't really add any value unless you're actually updating something 
else about the pub.  Actually, using "www.mypub.com" has some advantages 
here as it allows the user's web browser to negotiate https if available 
(the default nowadays) but fall back to http if not.

>
> I created a proposal for a one-off, scoped, automated edit for these 
> nodes to find the appropiate scheme for the existing URL and retag the 
> nodes.
>
> I added the proposal to the Automated edits log. You can read it here 
> <https://wiki.openstreetmap.org/wiki/Automated_edits/rodrigodiez/Add_missing_URL_scheme_to_pub_websites_in_UK>.


What would be rather more interesting would be detecting websites that 
"don't or no longer represent the pub" in some way:

  * Perhaps the pub had a website, but now has new tenants, and they now
    communicate with customers on the facebook page?
  * Perhaps the website is (like one of your examples) just for the brewery?
  * Perhaps the website now points at domain parking?
  * Perhaps the https certificate has expired, which at the very least
    indicates that the website is unlikely to be kept up to date?

Any problems found would likely need to be resolved manually, but some 
at least of the above should be detectable automatically.

Best Regards,

Andy


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-gb/attachments/20200928/a46ca698/attachment.htm>


More information about the Talk-GB mailing list