Site Reliability Engineering (SRE) at Google Cloud combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE, you'll ensure Google Cloud's services maintain reliability and appropriate uptime while managing performance and capacity. The role focuses on optimizing existing systems, building infrastructure, and automation.
You'll tackle unique scaling challenges specific to Google Cloud, applying expertise in coding, algorithms, complexity analysis, and large-scale system design. The SRE team values diversity, intellectual curiosity, and problem-solving in a blame-free environment. Google encourages collaboration, big thinking, and risk-taking while providing support and mentorship for growth.
The Technical Infrastructure team is fundamental to Google's operations, developing and maintaining data centers and building next-generation platforms. The team takes pride in being the engineers' engineers, ensuring networks run optimally for the best user experience. This role offers the opportunity to work with cutting-edge technology at massive scale while contributing to Google's core infrastructure.
The position combines technical depth with leadership opportunities, requiring both hands-on engineering skills and project management capabilities. You'll work in a collaborative environment that promotes self-direction on meaningful projects while maintaining a strong focus on system reliability and performance optimization.