Lead Site Reliability Engineer

Parent company of Bumble, Badoo, Bumble For Friends, and Geneva, pioneering dating and social connection platforms founded in 2014.
$198,000 - $250,000
Site Reliability
Staff Software Engineer
Hybrid
1,000 - 5,000 Employees
8+ years of experience
Enterprise SaaS

Description For Lead Site Reliability Engineer

Bumble Inc., the parent company behind popular platforms like Bumble, Badoo, and Bumble For Friends, is seeking a Lead Site Reliability Engineer to join their Infrastructure Engineering team. This role is crucial in ensuring the reliability, scalability, and performance of their software systems that serve millions of users worldwide.

As an SRE at Bumble, you'll bridge the gap between development, security, and operations, working on sophisticated systems that power one of the world's leading social connection platforms. You'll be responsible for designing and implementing robust infrastructure solutions, building automation frameworks, and maintaining high-availability systems that directly impact user experience.

The ideal candidate brings strong technical expertise in Python/Golang, Kubernetes, and cloud architectures, combined with excellent problem-solving and communication skills. You'll work with cutting-edge technologies and tools while collaborating with cross-functional teams to drive infrastructure improvements and system reliability.

This position offers competitive compensation ($198,000 - $250,000) and the flexibility of hybrid work arrangements in either New York City or Austin. You'll be part of an inclusive organization that values diversity and encourages applications from people of all backgrounds. Join Bumble to help build and maintain the technical infrastructure that enables meaningful connections for millions of users worldwide.

Last updated a month ago

Responsibilities For Lead Site Reliability Engineer

  • Design and build new tools and services from the ground up to solve complex problems
  • Build automation frameworks to streamline repetitive tasks
  • Design and maintain scalable, highly available and fault-tolerant systems
  • Build and maintain observability tooling including logging, Monitoring, tracing and alerting systems
  • Develop and maintain automation tooling to reduce manual intervention
  • Implement infrastructure as code (IaC) for infrastructure provisioning
  • Monitor system health and performance, identifying and fixing issues
  • Respond to system outages, troubleshooting root causes and implementing preventative measures
  • Collaborate with engineering teams and security engineers to improve system reliability, security and performance
  • Participate in on-call rotations
  • Create and maintain documentation to improve knowledge sharing across teams

Requirements For Lead Site Reliability Engineer

Python
Go
Kubernetes
Linux
Kafka
  • Excellent problem solving, analytical skills
  • Strong communication and collaboration skills
  • Proficiency in at least Python or Golang programming languages
  • Experience with CI/CD pipelines
  • Strong Proficiency with Kubernetes architecture
  • Prior experience in SRE, System administration or DevOps roles
  • Strong proficiency with Linux/Unix operating systems
  • Proficiency with using Puppet for configuration management
  • Hands-on experience in Monitoring and observability platforms
  • Experience with Cloud architectures such as GCP or AWS
  • Familiarity with SQL databases and broker systems such as Kafka
  • Solution-orientated professional with a passion for problem-solving
  • Thrive in a collaborative environment
  • Commitment to continuous learning

Interested in this job?

Jobs Related To Bumble Inc. Lead Site Reliability Engineer

Lead Site Reliability Engineer

Lead Site Reliability Engineer position at Bumble Inc., focusing on ensuring system reliability and scalability while working with cutting-edge technologies in a hybrid work environment in London.

Site Reliability Developer Opportunities - Mexico

Site Reliability Developer role at Oracle Mexico, focusing on cloud infrastructure and automation for Database Autonomous Recovery Service.

Staff Software Engineer, Reliability Engineering

Staff Software Engineer position at Airbnb focusing on Site Reliability Engineering, building and maintaining systems for service reliability at scale with incident management responsibilities.

Site Reliability Engineer – AIOps

Senior Site Reliability Engineer role focusing on AIOps at Oracle, building AI-driven solutions for cloud infrastructure reliability and automation.

Lead Site Reliability Engineer

Lead Site Reliability Engineer position at Bumble Inc., focusing on ensuring system reliability and scalability while working with cutting-edge technologies in a hybrid work environment in London.