Google's Core Enterprise System (CES) SRE team, part of Corporate Engineering-Site Reliability Engineering, is seeking a Site Reliability Manager to lead a team of 6-10 engineers. This role combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. The position focuses on providing SRE support to Enterprise applications within Google, powering key verticals such as Finance, Legal, Supply Chain, and HR.
As a Site Reliability Manager, you'll be responsible for ensuring Google Cloud's services maintain reliability and appropriate uptime while managing system capacity and performance. The role involves significant software development work focused on optimizing existing systems, building infrastructure, and implementing automation solutions. You'll tackle unique scaling challenges specific to Google Cloud while applying expertise in coding, algorithms, complexity analysis, and large-scale system design.
The team culture emphasizes diversity, intellectual curiosity, problem-solving, and openness. You'll work in a blame-free environment that encourages collaboration, big thinking, and risk-taking. The position offers the opportunity to work on meaningful projects with self-direction while providing support and mentorship for professional growth.
Key responsibilities include managing team operations, developing strategic roadmaps, engaging in service lifecycle management, implementing sustainable scaling solutions, and ensuring effective incident response. The role requires strong technical expertise combined with leadership skills to drive engineering excellence and innovation in Google's enterprise domain.
This is an excellent opportunity for experienced technical leaders who want to impact critical enterprise systems at global scale while leading and developing a team of skilled engineers. The position offers the chance to work with cutting-edge technology while solving complex challenges in system reliability and scalability.