Site Reliability Engineering (SRE) at Google Cloud combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE, you'll ensure Google Cloud's services maintain reliability and appropriate uptime while focusing on continuous improvement. The role involves optimizing existing systems, building infrastructure, and implementing automation solutions.
You'll tackle unique scaling challenges specific to Google Cloud while applying your expertise in coding, algorithms, complexity analysis, and large-scale system design. The position offers the opportunity to work with critical internal and external-facing systems, managing their capacity, performance, and reliability.
The SRE team values diversity, intellectual curiosity, and problem-solving in a blame-free environment. You'll join a collaborative culture that brings together people with diverse backgrounds and perspectives, encouraging big thinking and risk-taking. The team promotes self-direction on meaningful projects while providing support and mentorship for professional growth.
Working with development teams in California and Bangalore, you'll contribute to projects vital to Google Cloud's success. Your responsibilities include maintaining service availability, implementing scalability solutions, and developing automation to prevent recurring issues. This role offers a unique opportunity to impact Google Cloud's infrastructure while working with cutting-edge technology and a supportive team.