Cover Genius is a Series E insurtech that protects global customers of major digital companies. As a Site Reliability Engineer, you'll ensure reliable operation of production systems, working across technical areas to automate and improve platforms. Key responsibilities include:
- Analyzing, testing, and modifying systems for reliability and performance
- Developing observability tools and dashboards
- Implementing automation tools, CI/CD pipelines, and reducing toil
- Troubleshooting production issues
- Applying AWS and GCP knowledge to maintain cloud infrastructure
- Collaborating with Software Engineers to improve tools and procedures
- Developing documentation and runbooks
- Optimizing computing infrastructure costs
Requirements:
- Understanding of SRE principles and best practices
- Experience with modern observability tools (ELK/EFK, Prometheus, Grafana)
- Scripting skills (Bash, Python, Go)
- Experience with infrastructure as code (Terraform, Cloudformation)
- Container technology knowledge (Docker, Kubernetes)
- Linux experience
- Networking and system architecture understanding
- AWS/GCP knowledge
- Bachelor's degree in Computer Science/Engineering or equivalent experience
- Strong communication and documentation skills
- Self-motivated learner with attention to detail
Join a diverse team across 20+ countries, recognized as the #1 fastest-growing company in APAC by the Financial Times in 2020. Be part of an innovative company that values being bold, authentic, purposeful, and inspired.