<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">Donal Hunt's reference to the SRE book
is extremely valuable. I support his words. I support his stress
on creating scalable, reliable infrastructure and products.<br>
<div><br>
</div>
I was struck by the following words in the book:<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">
<pre><i>In general, an SRE team is responsible for the </i><em>availability, latency, performance, efficiency, change management,
monitoring, emergency response, and capacity planning</em><i> of their service(s).</i></pre>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">I have reworked the job advertisement,
mainly to slightly improve clarity and to untangle the scope of
work. See far below.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">I think the scope as it stands is too
broad for one person, and strays from the responsibilities of a
SRE team task above, and is not tightly focused on creating
scalable, reliable infrastructure and products. However, the
scope of work is probably usable for our merged SRE/Sysadmin team
so I have let it stand. If much of it is delegated to the team of
voluntary sysadmins then it may be workable. <br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">The relationship between the SRE and
the SysAdmin team currently is cloudy, which may lead to problems
and turf wars. The Board must expect to be called upon to define
roles.<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">Something that irks me is that the SRE
is supposed to deal with 'users'. I'm not sure who 'users' are in
our context, but whoever they are they should not have direct
access to the SRE. The SRE is not the 'Helpdesk'. Perhaps 'users'
can log bugs and issues for attention of the team and of course
can deal directly with the Board member who manages the SRE
function. <br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">That Board member above should be the
manager of the SRE. This advert tiptoes cautiously around that
responsibility but IMHO that nettle must be bravely grasped as any
relationship other than direct management by one person will fail
in ugly, messy ways. <br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">Craig Allan</div>
<div class="moz-cite-prefix">===============================<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">On 2020/07/24 14:09, Donal Hunt wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAF1AMMMv0Zgb6YYJgA5MHc5C51dXWZMUJSyU9prSHTcB=hS-hw@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">...
<div class="gmail_quote">
<div><br>
</div>
<div>I've been both a system administrator and an SRE manager
over the past 2 decades and the philosophy between the roles
is quite different. There is a tendency to interchange
sysadmin, devops and SRE but I would argue that they are
distinct roles with differing end goals. An SRE will be
invaluable in helping the organisation take stock of where
they are at and support the delivery of changes that will
create scalable, reliable infrastructure and products.</div>
<div><br>
</div>
<div>I would encourage people to read the <a
href="https://landing.google.com/sre/sre-book/chapters/introduction/"
moz-do-not-send="true">first chapter</a> of the SRE book
which captured the essence of the discipline back in 2016 /
2017.</div>
<div><br>
</div>
<div>Donal</div>
</div>
</div>
</blockquote>
<p>===========================================</p>
<p>=Senior site reliability engineer, OpenStreetMap Foundation<br>
<br>
The OpenStreetMap Foundation (OSMF)operates the systems behind
OpenStreetMaps, as global voluntary mapping project. The system
uses about 100 physical and virtual servers around the globe.
Keeping this core technical infrastructure running is a key
responsibility of the OSMF. Until the present time the system
operations and development roles have been admirably managed by a
team of very skilled volunteers but with the continuous growth of
the system the management Board is now looking to transtion a key
role to a permanent staff member.<br>
<br>
The engineer will work full time, and will be managed by one
member of the OMSF Board. The engineer will work with the
existing team of volunteers, and with support of the Board will be
able to delegate aspects of the Scope of Work (listed below) to
members of that team. <br>
<br>
An opportunity to apply for this position will be made available
to the members of the current voluntary sysadmin team before the
position is more widely advertised.<br>
<br>
==Scope of work<br>
<br>
===Operations<br>
Management, installation, configuration, maintenance and
responding to outages of the current system <br>
Management of relationships with data centres<br>
Disaster recovery <br>
<br>
===Development<br>
Improvement of all system infrastructure (hardware, software,
network, data centres…)<br>
Support for the applications upgrading pipeline<br>
<br>
===Management<br>
With Board support, manage and adjust the delegation of work
to volunteers<br>
Support, mentor and enable volunteers and (eventually)
co-workers<br>
Risk assessment and mitigation planning<br>
<br>
===Policing<br>
Enforcement of usage policies<br>
Identifying and limiting abuse<br>
Support Board revision of usage policy<br>
<br>
===Support<br>
Interaction with users, dealing with user requests<br>
First line of answering user tickets<br>
Management of github issues<br>
<br>
===Strategy<br>
Coordinating projects to work on with the Board<br>
Helping the Board establish long-term system development plans<br>
<br>
==Current Projects<br>
For information, current systems project proposals that are under
consideration include:<br>
<br>
===Operations<br>
AWS auditing and improvements<br>
Improving, centralising and reworking logging, monitoring,
reporting, and alerting<br>
Improving and reworking the tile serving architecture and
infrastructure<br>
Moving some infrastructure to containers or cloud. <br>
Moving to ‘server as a resource’ and away from 1 service =
1 server.<br>
Upgrading servers to Ubuntu 20.04<br>
Testing and improving backups<br>
Improving redundancy and availability of services<br>
Modernising runtime environments<br>
Network upgrades in Amsterdam<br>
Implementing Zero Downtime Upgrades (web, API, possibly
other deployments)<br>
Improved storage and hosting of community data (aerial
imagery, maps, photos…)<br>
Forum software upgrade<br>
Relaunch of GPX planet dumps <br>
===Development<br>
Improving the continuous integration and deployment
pipelines<br>
===Management<br>
Improving disaster recovery preparations<br>
Improve onboarding Documentation<br>
===Policing<br>
Improve policy documents and anti-abuse enforcement<br>
===Support<br>
nop<br>
===Strategy<br>
nop<br>
<br>
<br>
==Profile<br>
<br>
The applicant should be great communicator, with an excellent
command of written and spoken English, and should be willing and
able to collaborate online. They should be a creative and
inventive problem-solver.<br>
<br>
Being already involved in OpenStreetMap as a contributor, or
having experience with other Open Source or Open Data or volunteer
communities, will be useful to understand how our voluntary
community works. It should be noted that the sysadmin team and
board are all volunteers who have full-time jobs outside
OpenStreetMap. The successful engineer should be able to
self-organise and find direction in a sometimes difficult
environment that will benefit from their good communication and
inter-personal skills.<br>
<br>
==Technical requirements<br>
<br>
<i>The key words "MUST", "SHOULD", "MAY" are to be interpreted as
described in RFC 2119.</i><br>
<br>
The applicant MUST demonstrate experience with:<br>
<br>
Ubuntu or Debian based server administration<br>
Nginx<br>
Apache<br>
Shell scripting<br>
Git and github<br>
HTTP<br>
AWS<br>
<br>
The applicant SHOULD have experience with:<br>
<br>
Squid<br>
DNS<br>
Chef<br>
Load balancing and high availability architectures<br>
Containerisation<br>
<br>
The applicant MAY have experience with:<br>
<br>
Varnish<br>
Python<br>
Mapnik<br>
Nominatim<br>
Leaflet<br>
Vector tiles<br>
Docker<br>
Postgresql, postgis<br>
Mediawiki<br>
Ruby, Rails<br>
<br>
<br>
==Employment/contracting structure<br>
<br>
The person will work from their own premises, and most of the time
will determine their schedule.<br>
<br>
The OSMF is incorporated in England, deals frequently with UK
entities, and has most of its servers in York, London and
Amsterdam. A base in the UK or another country from which travel
to these places is easy would make some things easier, but it’s
not required. The OpenStreetMap Foundation is a global
organisation; working with people and systems in different time
zones and handling related scheduling constraints is expected.<br>
<br>
If the person is based in the UK, IR35 legislation makes it a lot
simpler for everyone if the OSMF hires them as an employee, rather
than a contractor or similar. The contract would in any case be
permanent, not fixed-term or temporary.<br>
</p>
</body>
</html>