Site Reliability Engineering (SRE) at Google combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE, you'll ensure Google Cloud's services maintain reliability and appropriate uptime while managing capacity and performance. The role focuses on optimizing existing systems, building infrastructure, and automation.
You'll tackle unique scaling challenges specific to Google Cloud, applying expertise in coding, algorithms, complexity analysis, and large-scale system design. The SRE team values diversity, intellectual curiosity, and problem-solving in a blame-free environment. Google encourages collaboration, big thinking, and risk-taking while providing support and mentorship for growth.
The position offers the opportunity to work with distributed systems at massive scale, contribute to critical infrastructure, and be part of a team that directly impacts Google Cloud's reliability. You'll work alongside diverse professionals, participate in design reviews, and help maintain Google's high standards for system reliability and performance.
This role is perfect for engineers who enjoy the intersection of software development and systems engineering, are passionate about large-scale infrastructure, and want to work on problems that affect millions of users. You'll have the chance to grow technically and professionally while working with cutting-edge technology and contributing to Google's world-class infrastructure.