Site Reliability Engineer

Cover Genius is a Series E insurtech that protects the global customers of the world's largest digital companies including Booking Holdings, Intuit, Uber, Hopper, Ryanair, and more.
Site Reliability
Senior Software Engineer
Hybrid
AI · Finance
This job posting may no longer be active. You may be interested in these related jobs instead:
Senior Software Engineer, Site Reliability Engineering, Google Cloud

Senior Site Reliability Engineer position at Google Cloud, focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.

Senior Software Engineer, Site Reliability Engineering, Google Cloud

Senior SRE position at Google Cloud focusing on building and maintaining large-scale distributed systems, requiring 5+ years of software development experience.

Senior Software Engineer, Site Reliability Engineering, Google Cloud

Senior SRE position at Google Cloud focusing on building and maintaining large-scale distributed systems, requiring 5+ years of software development experience.

Senior Software Engineer, Site Reliability Engineering, Data Cloud

Senior Site Reliability Engineer role at Google, focusing on building AI-powered infrastructure and maintaining large-scale distributed systems for Google Cloud Platform.

Senior Software Engineer, Site Reliability Engineering

Senior SRE position at Google focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.

Description For Site Reliability Engineer

Cover Genius is a Series E insurtech that protects global customers of major digital companies. As a Site Reliability Engineer, you'll ensure reliable operation of production systems, working across technical areas to automate and improve platforms. Key responsibilities include:

  • Analyzing, testing, and modifying systems for reliability and performance
  • Developing observability tools and dashboards
  • Implementing automation tools, CI/CD pipelines, and reducing toil
  • Troubleshooting production issues
  • Applying AWS and GCP knowledge to maintain cloud infrastructure
  • Collaborating with Software Engineers to improve tools and procedures
  • Developing documentation and runbooks
  • Optimizing computing infrastructure costs

Requirements:

  • Understanding of SRE principles and best practices
  • Experience with modern observability tools (ELK/EFK, Prometheus, Grafana)
  • Scripting skills (Bash, Python, Go)
  • Experience with infrastructure as code (Terraform, Cloudformation)
  • Container technology knowledge (Docker, Kubernetes)
  • Linux experience
  • Networking and system architecture understanding
  • AWS/GCP knowledge
  • Bachelor's degree in Computer Science/Engineering or equivalent experience
  • Strong communication and documentation skills
  • Self-motivated learner with attention to detail

Join a diverse team across 20+ countries, recognized as the #1 fastest-growing company in APAC by the Financial Times in 2020. Be part of an innovative company that values being bold, authentic, purposeful, and inspired.

Last updated 4 months ago

Responsibilities For Site Reliability Engineer

  • Analyze, test and modify systems to improve reliability and optimize performance particularly at an architectural/infrastructure level
  • Develop and maintain observability tooling and dashboards
  • Implement automation tools and frameworks, CI/CD pipelines, Reduce toil
  • Troubleshoot production issues and coordinate with the development team to streamline code deployments
  • Apply AWS and GCP knowledge and skills to create & maintain cloud infrastructure for software projects
  • Design, develop and implement software integrations
  • Collaborate with Software Engineers and other team members with the goal of improving engineering tools, systems, procedures and data security
  • Develop and maintain design and troubleshooting documentation and runbooks
  • Optimize and control costs of the company's computing infrastructure

Requirements For Site Reliability Engineer

Linux
Python
Go
Kubernetes
  • Understanding of SRE Principles and best practices
  • Experience using & configuring modern observability tools such as ELK/EFK, Prometheus, Grafana
  • Comfortable scripting & developing internal tooling with Bash and at least one programming language (e.g. python, go)
  • Experience working with infrastructure & configuration as code tools such as Terraform, Cloudformation, Chef, Puppet etc.
  • Experienced with container technology such as Docker and Ideally experienced with using and managing Kubernetes clusters
  • Experience working with Linux
  • Solid understanding of networking and system architecture
  • Solid understanding of how to deploy, scale and monitor web applications and databases
  • Good knowledge of AWS and/or GCP platforms and associated best practices
  • Bachelor Degree in Computer Science/Engineering or equivalent practical experience
  • Strong communication and documentation skills
  • Curious and self motivated learner
  • Professional approach
  • Good team member
  • Organisational and time management skills
  • Excellent attention to detail
  • Positive approach to change

Interested in this job?