Site Reliability / GitOps Engineer

Canonical is a pioneering tech firm at the forefront of the global move to open source, publishing Ubuntu.
Site Reliability
Staff Software Engineer
Remote
Enterprise SaaS
This job posting may no longer be active. You may be interested in these related jobs instead:
Staff Software Engineer, Reliability Engineering

Staff Software Engineer position at Airbnb focusing on Site Reliability Engineering, developing and maintaining tools for service reliability at scale.

Sr Staff Software Engineer, Reliability Engineering

Senior Staff SRE position at Airbnb focusing on reliability architecture, incident management, and technical leadership, offering competitive compensation and remote work flexibility.

Technical Program Manager III, Site Reliability, Storage

Technical Program Manager III position at Google, leading Storage Site Reliability Engineering initiatives and cross-functional programs.

Software Engineering Manager II, Site Reliability Engineering

Lead Google's Site Reliability Engineering team, managing distributed systems and ensuring service reliability while driving technical innovation and team development.

Software Engineering Manager II, Site Reliability Engineering

Lead Site Reliability Engineering team at Google, managing distributed systems and ensuring service reliability while providing technical leadership and team development.

Description For Site Reliability / GitOps Engineer

Canonical, the company behind Ubuntu, is seeking a Site Reliability / GitOps Engineer to join their IS team. This role offers a unique opportunity for a hands-on technologist with a passion for Linux to build a career with Canonical and drive success with Ubuntu and open source products.

As an SRE & GitOps engineer, you'll be responsible for supporting and maintaining all of Canonical's IT production services, used by over 60 million Ubuntu users. You'll drive operations automation to the next level in both private and public clouds, utilizing open source infrastructure as code software, CI/CD pipelines, and Canonical's leading products for software operation automation.

Key responsibilities include:

  • Developing infrastructure as code practices
  • Automating software operations for re-usability and consistency across clouds
  • Maintaining operational responsibility for Canonical's core services, networks, and infrastructure
  • Troubleshooting, capacity planning, and performance investigation
  • Collaborating with development teams on service architecture and operational procedures
  • Providing assistance to globally distributed teams
  • Carrying final responsibility for time-critical escalations

The ideal candidate will have:

  • Deep experience in defining operations in code
  • Strong modern engineering background (peer-review, unit testing, SCM, CI/CD, Agile)
  • Python software development experience
  • Practical knowledge of Linux networking, storage, and administration
  • Extensive knowledge of cloud computing concepts and technologies
  • A bachelor's degree or higher, preferably in computer science or a related field
  • Excellent communication skills in English
  • Passion for open-source, especially Ubuntu or Debian

Canonical offers a competitive base pay and additional benefits, including:

  • Fully remote working environment
  • Personal learning and development budget
  • Annual compensation review
  • Recognition rewards
  • Annual holiday leave
  • Parental leave
  • Employee Assistance Programme
  • Opportunity to travel for team 'sprints'
  • Priority Pass for travel

Join Canonical to be part of a company that's changing the world daily, challenging you to think differently, work smarter, and raise your game in the exciting field of open source technology.

Last updated a month ago

Responsibilities For Site Reliability / GitOps Engineer

  • Develop infrastructure as code practices
  • Automate software operations across private and public clouds
  • Maintain operational responsibility for Canonical's core services, networks, and infrastructure
  • Develop new features and improve resilience and scalability of cloud and container portfolio
  • Set up and maintain observability tools (Prometheus, Grafana, Elasticsearch)
  • Collaborate with development teams on service architecture and operational procedures
  • Provide assistance to globally distributed teams
  • Carry final responsibility for time-critical escalations

Requirements For Site Reliability / GitOps Engineer

Linux
Python
Kubernetes
  • Deep experience in defining operations in code
  • Strong modern engineering background (peer-review, unit testing, SCM, CI/CD, Agile)
  • Python software development experience with large projects
  • Practical knowledge of Linux networking, routing, and firewalls
  • Hands-on experience administering enterprise Linux servers
  • Extensive knowledge of cloud computing concepts and technologies
  • Bachelor's degree or higher, preferably in computer science or related engineering field
  • Excellent communication skills in English
  • Ability to troubleshoot from kernel to web
  • Willingness to be flexible and learn new things quickly
  • Passion for open-source, especially Ubuntu or Debian

Benefits For Site Reliability / GitOps Engineer

Equity
Parental Leave
Education Budget
  • Fully remote working environment
  • Personal learning and development budget of 2,000USD per annum
  • Annual compensation review
  • Recognition rewards
  • Annual holiday leave
  • Parental Leave
  • Employee Assistance Programme
  • Opportunity to travel to meet colleagues at 'sprints'
  • Priority Pass for travel and travel upgrades for long haul company events

Interested in this job?