Lead Site Reliability Engineer

Parent company of Bumble, Badoo, Fruitz and Official, pioneering dating apps built with women at the center, connecting people through dating, friendship, and networking.
Site Reliability
Staff Software Engineer
Hybrid
Consumer

Description For Lead Site Reliability Engineer

Bumble Inc., the parent company behind popular dating apps Bumble, Badoo, Fruitz, and Official, is seeking a Lead Site Reliability Engineer to join their team in a hybrid work arrangement in London. This role is crucial in ensuring the reliability, scalability, and performance of software systems while bridging the gap between development, security, and operations.

As an SRE at Bumble, you'll be at the forefront of maintaining and improving the infrastructure that powers multiple dating platforms serving millions of users worldwide. You'll work with cutting-edge technologies including Kubernetes, Python/Go, and various monitoring tools to build and maintain robust systems.

The ideal candidate will bring strong technical expertise in infrastructure automation, system reliability, and DevOps practices, combined with excellent problem-solving and communication skills. You'll be responsible for designing scalable solutions, implementing infrastructure as code, and ensuring system health through comprehensive monitoring and quick incident response.

Bumble offers an inclusive work environment that welcomes diversity in all forms. The company strongly encourages applications from people of all backgrounds, including LGBTQ+ individuals, veterans, parents, and people with disabilities. This role offers the opportunity to work with a mission-driven company that's transforming how people build relationships through technology.

This position is perfect for someone who is passionate about system reliability, thrives in a collaborative environment, and is committed to continuous learning and improvement. You'll be part of a team that values innovation, quality, and the delivery of exceptional experiences to stakeholders.

Last updated 14 days ago

Responsibilities For Lead Site Reliability Engineer

  • Design and build new tools and services to solve complex problems
  • Build automation frameworks to streamline repetitive tasks
  • Design and maintain scalable, highly available and fault-tolerant systems
  • Build and maintain observability tooling including logging, monitoring, tracing and alerting systems
  • Develop and maintain automation tooling to reduce manual intervention
  • Implement infrastructure as code (IaC) for infrastructure provisioning
  • Monitor system health and performance, identifying and fixing issues
  • Respond to system outages, troubleshooting root causes and implementing preventative measures
  • Collaborate with engineering teams and security engineers
  • Participate in on-call rotations
  • Create and maintain documentation

Requirements For Lead Site Reliability Engineer

Python
Go
Kubernetes
Linux
Kafka
  • Excellent problem solving and analytical skills
  • Strong communication and collaboration skills
  • Proficiency in Python or Golang programming languages
  • Experience with CI/CD pipelines
  • Strong Proficiency with Kubernetes architecture
  • Prior experience in SRE, System administration or DevOps roles
  • Strong proficiency with Linux/Unix operating systems
  • Proficiency with using Puppet for configuration management
  • Experience with monitoring platforms (Grafana, Prometheus, Elasticsearch, Jaeger)
  • Experience with Cloud architectures such as GCP or AWS
  • Familiarity with SQL databases and broker systems like Kafka
  • Solution-oriented with passion for problem-solving
  • Commitment to quality and continuous learning

Interested in this job?

Jobs Related To Bumble Inc. Lead Site Reliability Engineer

Staff Software Engineer, Reliability Engineering

Staff Software Engineer position at Airbnb focusing on Site Reliability Engineering, developing and maintaining tools for service reliability at scale.

Sr Staff Software Engineer, Reliability Engineering

Senior Staff SRE position at Airbnb focusing on reliability architecture, incident management, and technical leadership, offering competitive compensation and remote work flexibility.

Senior Site Reliability Engineer

Remote Senior Site Reliability Engineer position at ZayZoon, focusing on AWS infrastructure and production deployments across Canada.

Site Reliability Engineering II

Senior Site Reliability Engineer position at Microsoft focusing on identity and security engineering, requiring 5+ years of experience in identity technologies and security infrastructure.

Site Reliability Manager, Core Enterprise Systems

Lead a team of SRE engineers at Google, managing enterprise services and driving reliability improvements across critical internal systems.