Borg Lifecycle Site Reliability Engineer

Google is a global technology leader specializing in internet-related services and products.
Site Reliability
Mid-Level Software Engineer
In-Person
5,000+ Employees
5+ years of experience
Enterprise SaaS · AI

Description For Borg Lifecycle Site Reliability Engineer

Site Reliability Engineering (SRE) at Google combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As a Borg Lifecycle SRE, you'll be responsible for ensuring Google Cloud's services maintain reliability and appropriate uptime while managing complex challenges of scale unique to Google Cloud. The role involves optimizing existing systems, building infrastructure, and automating processes.

The position requires expertise in coding, algorithms, complexity analysis, and large-scale system design. You'll work specifically with the Borg infrastructure, which is crucial to Google's operations, handling diverse challenges across global infrastructure and working on high-impact projects that drive innovation.

The SRE team at Google embraces a culture of diversity, intellectual curiosity, and problem-solving in a blame-free environment. You'll collaborate with people from various backgrounds and perspectives, working on meaningful projects while receiving support and mentorship for professional growth. The role combines technical leadership, hands-on engineering, and production support, making it ideal for those interested in large-scale distributed systems and infrastructure management.

Key aspects include managing Borg lifecycle phases, supporting different cell flavors, and participating in on-call rotations to ensure 24/7 reliability. You'll work closely with development teams and other SREs to design and implement scalable, reliable, and secure solutions that support various Google initiatives. This role offers an opportunity to impact Google's infrastructure at a global scale while working with cutting-edge technology and talented engineers.

Last updated a month ago

Responsibilities For Borg Lifecycle Site Reliability Engineer

  • Drive the technical direction for the Borg Lifecycle SRE team
  • Provide ongoing engineering and production support for Borg lifecycle phases (turnup, turndown, cell management) and support different Borg cell flavors
  • Work with partner development and SRE teams to design and deliver different programs and projects in a scalable, reliable, and secure manner
  • Design and develop innovative solutions that enable key Google initiatives that scale with the requirements of the business
  • Be a full member of Borg SRE on-call rotation(s). Support the Borg ecosystem at global scale and ensure production keeps running for our users

Requirements For Borg Lifecycle Site Reliability Engineer

Python
Java
Linux
  • 5 years of experience with performance, system architecture, systems data analysis, visualization tools, debugging
  • Coding and scripting experience in one or more languages (Python, Perl, C, C++ or Java)
  • Master's degree or PhD in Engineering, Computer Science, or a related technical field (preferred)
  • 5 years of experience with UNIX/Linux (preferred)
  • Experience with cloud solutions: Open source software communities, Cloud networking solutions, distributed-computing technology, Hybrid/Multi Cloud connectivity (preferred)

Interested in this job?

Jobs Related To Google Borg Lifecycle Site Reliability Engineer

Software Engineer, Traffic Trust SRE, DoS Infrastructure

Site Reliability Engineer position at Google focusing on Traffic Trust and DoS Infrastructure, combining software engineering with systems operations to maintain large-scale distributed systems.

Software Engineer III, Site Reliability Engineer

Site Reliability Engineer role at Google focusing on building and maintaining large-scale distributed systems for Google Cloud services.

Databases Site Reliability Engineer

Site Reliability Engineer position at Google focusing on database systems, requiring expertise in distributed systems and infrastructure management.

Software Engineer III, Site Reliability Engineering

Site Reliability Engineer role at Google focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.

Software Engineer III, Site Reliability Engineering, Google Cloud

Site Reliability Engineer role at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.