Site Reliability Engineer

Cover Genius is a Series E insurtech that protects the global customers of the world's largest digital companies including Booking Holdings, Intuit, Uber, Hopper, Ryanair, and more.
Site Reliability
Senior Software Engineer
Hybrid
AI · Finance
This job posting may no longer be active. You may be interested in these related jobs instead:
Site Reliability Engineer

Senior Site Reliability Engineer role at AION, building and maintaining infrastructure for a decentralized AI cloud platform with focus on automation and reliability.

Senior Software Developer, Site Reliability Engineering, Google Cloud

Senior Software Developer role in Site Reliability Engineering at Google Cloud, focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.

Senior Software Developer, Site Reliability Engineering, Google Cloud

Senior SRE role at Google Cloud focusing on building and maintaining large-scale distributed systems with competitive compensation and comprehensive benefits.

Senior Software Engineer, SRE, Cloud Incident Response

Senior SRE position at Google focusing on Cloud Incident Response, requiring expertise in distributed systems and incident management.

Senior Software Engineer, Site Reliability Engineering

Senior Site Reliability Engineering role at Google, focusing on building and maintaining large-scale distributed systems for Google Cloud services.

Description For Site Reliability Engineer

Cover Genius is a Series E insurtech that protects global customers of major digital companies. As a Site Reliability Engineer, you'll ensure reliable operation of production systems, working across technical areas to automate and improve platforms. Key responsibilities include:

  • Analyzing, testing, and modifying systems for reliability and performance
  • Developing observability tools and dashboards
  • Implementing automation tools, CI/CD pipelines, and reducing toil
  • Troubleshooting production issues
  • Applying AWS and GCP knowledge to maintain cloud infrastructure
  • Collaborating with Software Engineers to improve tools and procedures
  • Developing documentation and runbooks
  • Optimizing computing infrastructure costs

Requirements:

  • Understanding of SRE principles and best practices
  • Experience with modern observability tools (ELK/EFK, Prometheus, Grafana)
  • Scripting skills (Bash, Python, Go)
  • Experience with infrastructure as code (Terraform, Cloudformation)
  • Container technology knowledge (Docker, Kubernetes)
  • Linux experience
  • Networking and system architecture understanding
  • AWS/GCP knowledge
  • Bachelor's degree in Computer Science/Engineering or equivalent experience
  • Strong communication and documentation skills
  • Self-motivated learner with attention to detail

Join a diverse team across 20+ countries, recognized as the #1 fastest-growing company in APAC by the Financial Times in 2020. Be part of an innovative company that values being bold, authentic, purposeful, and inspired.

Last updated 7 months ago

Responsibilities For Site Reliability Engineer

  • Analyze, test and modify systems to improve reliability and optimize performance particularly at an architectural/infrastructure level
  • Develop and maintain observability tooling and dashboards
  • Implement automation tools and frameworks, CI/CD pipelines, Reduce toil
  • Troubleshoot production issues and coordinate with the development team to streamline code deployments
  • Apply AWS and GCP knowledge and skills to create & maintain cloud infrastructure for software projects
  • Design, develop and implement software integrations
  • Collaborate with Software Engineers and other team members with the goal of improving engineering tools, systems, procedures and data security
  • Develop and maintain design and troubleshooting documentation and runbooks
  • Optimize and control costs of the company's computing infrastructure

Requirements For Site Reliability Engineer

Linux
Python
Go
Kubernetes
  • Understanding of SRE Principles and best practices
  • Experience using & configuring modern observability tools such as ELK/EFK, Prometheus, Grafana
  • Comfortable scripting & developing internal tooling with Bash and at least one programming language (e.g. python, go)
  • Experience working with infrastructure & configuration as code tools such as Terraform, Cloudformation, Chef, Puppet etc.
  • Experienced with container technology such as Docker and Ideally experienced with using and managing Kubernetes clusters
  • Experience working with Linux
  • Solid understanding of networking and system architecture
  • Solid understanding of how to deploy, scale and monitor web applications and databases
  • Good knowledge of AWS and/or GCP platforms and associated best practices
  • Bachelor Degree in Computer Science/Engineering or equivalent practical experience
  • Strong communication and documentation skills
  • Curious and self motivated learner
  • Professional approach
  • Good team member
  • Organisational and time management skills
  • Excellent attention to detail
  • Positive approach to change

Interested in this job?