Google's Core Enterprise System (CES) SRE team is seeking a Site Reliability Manager to lead a team of 6-10 engineers supporting critical enterprise services. This role sits within Corporate Engineering-Site Reliability Engineering (SRE) and provides support to Enterprise applications powering key verticals such as Finance, Legal, Supply Chain, and HR.
The position combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. You'll be responsible for ensuring Google Cloud's services maintain reliability and uptime while continuously improving performance and capacity. The role involves significant technical leadership, requiring 5 years of software development experience and strong expertise in algorithms, data structures, and Unix/Linux systems.
As a Site Reliability Manager, you'll develop strategic roadmaps and OKRs, lead service lifecycle management from design through deployment, and implement automation for sustainable scaling. You'll need 3 years of people management experience to effectively lead your team and 3 years of project leadership experience working with system administration or networking.
The role offers unique opportunities to tackle complex challenges at Google's scale while working in a culture that values diversity, intellectual curiosity, and problem-solving. You'll collaborate with teams across Google to transform enterprise services through standardized solutions and platforms. The position requires expertise in enterprise applications, cloud workload management, and building strategic partnerships with internal customers.
This is an excellent opportunity for an experienced technical leader who wants to impact critical enterprise systems at one of the world's leading technology companies. You'll have the support and resources to drive innovation while developing your team and advancing your career in site reliability engineering.