Google Cloud's Site Reliability Engineering (SRE) team combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE, you'll ensure Google Cloud's services maintain reliability and appropriate uptime while managing performance and capacity. The role focuses on optimizing existing systems, building infrastructure, and automating processes.
The position offers unique challenges of scale specific to Google Cloud, requiring expertise in coding, algorithms, complexity analysis, and large-scale system design. SRE's culture emphasizes diversity, intellectual curiosity, and problem-solving in a blame-free environment. The team welcomes individuals with diverse backgrounds and perspectives, promoting collaboration and risk-taking.
Working in the Technical Infrastructure team, you'll be part of the backbone that keeps Google's services running smoothly. The team is responsible for developing and maintaining data centers, building next-generation Google platforms, and ensuring networks operate at peak performance. This role combines hands-on engineering with systems architecture, offering opportunities to design, implement, and maintain critical infrastructure at a global scale.
The position offers growth through self-directed meaningful projects while providing support and mentorship. You'll work with cutting-edge technology, solve complex distributed systems challenges, and contribute to Google's infrastructure evolution.