Site Reliability Engineer

Cover Genius

Cover Genius is a Series E insurtech that protects the global customers of the world's largest digital companies including Booking Holdings, Intuit, Uber, Hopper, Ryanair, and more.

Sydney NSW, Australia

Site Reliability

Senior Software Engineer

Hybrid

AI · Finance

This job posting may no longer be active. You may be interested in these related jobs instead:

Site Reliability Developer (Join OCI Ns2)

Oracle

Senior Site Reliability Developer position at Oracle focusing on building and maintaining large-scale distributed systems with emphasis on security, resiliency, and performance.

Site Reliability Engineer

Wheely

Senior Site Reliability Engineer position at Wheely, focusing on infrastructure security, monitoring, and DevOps practices in Nicosia, Cyprus.

Senior Software Engineer, Site Reliability Engineering

Adobe

Senior SRE position at Adobe working on Identity Services, focusing on scalability, reliability and zero downtime for systems handling millions of requests.

Site Reliability Engineer - Cloud

NVIDIA

Senior Site Reliability Engineer position at NVIDIA focusing on AWS infrastructure and cloud services, offering competitive compensation and opportunity to work with cutting-edge technology.

Senior Software Engineer, Site Reliability Tooling

Upstart

Senior Site Reliability Engineer role at Upstart, focusing on tooling and automation for infrastructure reliability. Remote-friendly position with competitive compensation and comprehensive benefits.

Description For Site Reliability Engineer

Cover Genius is a Series E insurtech that protects global customers of major digital companies. As a Site Reliability Engineer, you'll ensure reliable operation of production systems, working across technical areas to automate and improve platforms. Key responsibilities include:

Analyzing, testing, and modifying systems for reliability and performance
Developing observability tools and dashboards
Implementing automation tools, CI/CD pipelines, and reducing toil
Troubleshooting production issues
Applying AWS and GCP knowledge to maintain cloud infrastructure
Collaborating with Software Engineers to improve tools and procedures
Developing documentation and runbooks
Optimizing computing infrastructure costs

Requirements:

Understanding of SRE principles and best practices
Experience with modern observability tools (ELK/EFK, Prometheus, Grafana)
Scripting skills (Bash, Python, Go)
Experience with infrastructure as code (Terraform, Cloudformation)
Container technology knowledge (Docker, Kubernetes)
Linux experience
Networking and system architecture understanding
AWS/GCP knowledge
Bachelor's degree in Computer Science/Engineering or equivalent experience
Strong communication and documentation skills
Self-motivated learner with attention to detail

Join a diverse team across 20+ countries, recognized as the #1 fastest-growing company in APAC by the Financial Times in 2020. Be part of an innovative company that values being bold, authentic, purposeful, and inspired.

Last updated 8 months ago

Responsibilities For Site Reliability Engineer

Analyze, test and modify systems to improve reliability and optimize performance particularly at an architectural/infrastructure level
Develop and maintain observability tooling and dashboards
Implement automation tools and frameworks, CI/CD pipelines, Reduce toil
Troubleshoot production issues and coordinate with the development team to streamline code deployments
Apply AWS and GCP knowledge and skills to create & maintain cloud infrastructure for software projects
Design, develop and implement software integrations
Collaborate with Software Engineers and other team members with the goal of improving engineering tools, systems, procedures and data security
Develop and maintain design and troubleshooting documentation and runbooks
Optimize and control costs of the company's computing infrastructure

Requirements For Site Reliability Engineer

Linux

Python

Kubernetes

Understanding of SRE Principles and best practices
Experience using & configuring modern observability tools such as ELK/EFK, Prometheus, Grafana
Comfortable scripting & developing internal tooling with Bash and at least one programming language (e.g. python, go)
Experience working with infrastructure & configuration as code tools such as Terraform, Cloudformation, Chef, Puppet etc.
Experienced with container technology such as Docker and Ideally experienced with using and managing Kubernetes clusters
Experience working with Linux
Solid understanding of networking and system architecture
Solid understanding of how to deploy, scale and monitor web applications and databases
Good knowledge of AWS and/or GCP platforms and associated best practices
Bachelor Degree in Computer Science/Engineering or equivalent practical experience
Strong communication and documentation skills
Curious and self motivated learner
Professional approach
Good team member
Organisational and time management skills
Excellent attention to detail
Positive approach to change