Site Reliability Engineering (SRE) at Google Cloud combines software and systems engineering to build and maintain large-scale distributed systems. As a Senior SRE, you'll be responsible for ensuring the reliability and uptime of Google Cloud's services, both internal and customer-facing systems. The role involves complex challenges of scale unique to Google Cloud, requiring expertise in coding, algorithms, complexity analysis, and large-scale system design.
The position sits within Google's Technical Infrastructure team, which is fundamental to keeping Google's vast product portfolio running. You'll work on optimizing existing systems, building infrastructure, and automating processes to eliminate manual work. The team takes pride in being the engineers' engineers, focusing on maintaining and improving Google's networks and platforms.
The role offers an opportunity to work in a diverse, intellectually curious environment that encourages collaboration, big thinking, and risk-taking in a blame-free culture. Google promotes self-direction on meaningful projects while providing support and mentorship for growth and learning. The organization brings together people with varied backgrounds and perspectives, fostering an inclusive environment where innovation thrives.
You'll be part of a team that manages the entire lifecycle of services, from design and deployment to operation and refinement. This includes system design consulting, capacity planning, launch reviews, and maintaining service health through monitoring and automation. The role requires both technical expertise and leadership skills, as you'll be expected to guide projects and provide technical direction while working on complex distributed systems.