Site Reliability Manager, Core Enterprise Systems

Google is a global technology company that builds and maintains large-scale, distributed systems and infrastructure.
Site Reliability
Staff Software Engineer
In-Person
5+ years of experience
Enterprise SaaS

Description For Site Reliability Manager, Core Enterprise Systems

Google's Core Enterprise System (CES) SRE team, part of Corporate Engineering-Site Reliability Engineering, is seeking a Site Reliability Manager to lead a team of 6-10 engineers. This role combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. The position focuses on providing SRE support to Enterprise applications within Google, powering key verticals such as Finance, Legal, Supply Chain, and HR.

As a Site Reliability Manager, you'll be responsible for ensuring Google Cloud's services maintain reliability and appropriate uptime while managing system capacity and performance. The role involves significant software development work focused on optimizing existing systems, building infrastructure, and implementing automation solutions. You'll tackle unique scaling challenges specific to Google Cloud while applying expertise in coding, algorithms, complexity analysis, and large-scale system design.

The team culture emphasizes diversity, intellectual curiosity, problem-solving, and openness. You'll work in a blame-free environment that encourages collaboration, big thinking, and risk-taking. The position offers the opportunity to work on meaningful projects with self-direction while providing support and mentorship for professional growth.

Key responsibilities include managing team operations, developing strategic roadmaps, engaging in service lifecycle management, implementing sustainable scaling solutions, and ensuring effective incident response. The role requires strong technical expertise combined with leadership skills to drive engineering excellence and innovation in Google's enterprise domain.

This is an excellent opportunity for experienced technical leaders who want to impact critical enterprise systems at global scale while leading and developing a team of skilled engineers. The position offers the chance to work with cutting-edge technology while solving complex challenges in system reliability and scalability.

Last updated 2 days ago

Responsibilities For Site Reliability Manager, Core Enterprise Systems

  • Manage a team of 6-10 site reliability engineers supporting Google's enterprise services
  • Develop roadmaps, planning, OKRs to move forward the maturity of the managed services
  • Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation and refinement
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
  • Practice sustainable incident response ensuring services meet their service level objectives

Requirements For Site Reliability Manager, Core Enterprise Systems

Linux
  • Bachelor's degree in Computer Science, a related field, or equivalent practical experience
  • 5 years of experience with software development in one or more programming languages
  • 5 years of experience in algorithms, data structures, analysis, software design/development or Unix/Linux systems, IP networking, performance and application issues
  • 3 years of experience leading projects and working with administration
  • 3 years of people management experience
  • Experience in SAP or other ERP systems (preferred)
  • Experience in an engineering or operations role in Enterprise Applications (preferred)
  • Expertise in building strategic partnership with internal customers (preferred)
  • Proficiency in navigating enterprise software, deployment, and management of workloads on Cloud (preferred)

Interested in this job?

Jobs Related To Google Site Reliability Manager, Core Enterprise Systems

Technical Program Manager, Site Reliability

Technical Program Manager position at Google, leading Site Reliability initiatives for AI, Trust and Security platforms, requiring 8+ years of program management and SRE experience.

Software Engineering Manager II, Site Reliability Engineering

Lead Site Reliability Engineering team at Google, managing distributed systems and infrastructure while ensuring service reliability and performance.

Software Engineering Manager II, Namespaces Site Reliability Engineering

Lead Google's Namespaces SRE team, managing distributed systems and storage infrastructure while ensuring reliability and scalability of critical services.

Software Engineering Manager II, Site Reliability Engineering

Lead Site Reliability Engineering team at Google, managing distributed systems and service reliability while mentoring engineers and driving technical excellence.

Software Engineering Manager II, Site Reliability Engineering

Lead Site Reliability Engineering teams at Google, managing distributed systems and infrastructure while ensuring service reliability and performance at global scale.