Lead Site Reliability Engineer

Parent company of Bumble, Badoo, Fruitz and Official, pioneering dating apps built with women at the center, connecting people through dating, friendship, and networking.
Site Reliability
Staff Software Engineer
Hybrid
Consumer

Description For Lead Site Reliability Engineer

Bumble Inc., the parent company behind popular dating apps Bumble, Badoo, Fruitz, and Official, is seeking a Lead Site Reliability Engineer to join their team in a hybrid work arrangement in London. This role is crucial in ensuring the reliability, scalability, and performance of software systems while bridging the gap between development, security, and operations.

As an SRE at Bumble, you'll be at the forefront of maintaining and improving the infrastructure that powers multiple dating platforms serving millions of users worldwide. You'll work with cutting-edge technologies including Kubernetes, Python/Go, and various monitoring tools to build and maintain robust systems.

The ideal candidate will bring strong technical expertise in infrastructure automation, system reliability, and DevOps practices, combined with excellent problem-solving and communication skills. You'll be responsible for designing scalable solutions, implementing infrastructure as code, and ensuring system health through comprehensive monitoring and quick incident response.

Bumble offers an inclusive work environment that welcomes diversity in all forms. The company strongly encourages applications from people of all backgrounds, including LGBTQ+ individuals, veterans, parents, and people with disabilities. This role offers the opportunity to work with a mission-driven company that's transforming how people build relationships through technology.

This position is perfect for someone who is passionate about system reliability, thrives in a collaborative environment, and is committed to continuous learning and improvement. You'll be part of a team that values innovation, quality, and the delivery of exceptional experiences to stakeholders.

Last updated 2 months ago

Responsibilities For Lead Site Reliability Engineer

  • Design and build new tools and services to solve complex problems
  • Build automation frameworks to streamline repetitive tasks
  • Design and maintain scalable, highly available and fault-tolerant systems
  • Build and maintain observability tooling including logging, monitoring, tracing and alerting systems
  • Develop and maintain automation tooling to reduce manual intervention
  • Implement infrastructure as code (IaC) for infrastructure provisioning
  • Monitor system health and performance, identifying and fixing issues
  • Respond to system outages, troubleshooting root causes and implementing preventative measures
  • Collaborate with engineering teams and security engineers
  • Participate in on-call rotations
  • Create and maintain documentation

Requirements For Lead Site Reliability Engineer

Python
Go
Kubernetes
Linux
Kafka
  • Excellent problem solving and analytical skills
  • Strong communication and collaboration skills
  • Proficiency in Python or Golang programming languages
  • Experience with CI/CD pipelines
  • Strong Proficiency with Kubernetes architecture
  • Prior experience in SRE, System administration or DevOps roles
  • Strong proficiency with Linux/Unix operating systems
  • Proficiency with using Puppet for configuration management
  • Experience with monitoring platforms (Grafana, Prometheus, Elasticsearch, Jaeger)
  • Experience with Cloud architectures such as GCP or AWS
  • Familiarity with SQL databases and broker systems like Kafka
  • Solution-oriented with passion for problem-solving
  • Commitment to quality and continuous learning

Interested in this job?

Jobs Related To Bumble Inc. Lead Site Reliability Engineer

Software Engineering Manager II, Namespaces Site Reliability Engineering

Lead Google's Namespaces SRE team, managing distributed systems and storage infrastructure while ensuring reliability and performance of critical services.

Software Engineering Manager II, Site Reliability Engineering

Lead Google's Site Reliability Engineering team in maintaining and optimizing large-scale distributed systems while managing and mentoring software engineers.

Software Engineering Manager II, Site Reliability Engineering

Lead Google's Site Reliability Engineering team in ensuring the reliability and performance of large-scale distributed systems while managing and mentoring engineering talent.

Senior Staff Software Engineer, Site Reliability Engineering

Senior Staff SRE position at Google, focusing on building and maintaining large-scale distributed systems for Google Cloud services.

Senior Staff Software Engineer, Site Reliability Engineering, Google Cloud

Senior Staff SRE position at Google Cloud, focusing on building and maintaining large-scale distributed systems with competitive compensation and benefits.