Google Cloud is seeking a Senior Site Reliability Engineer to join their Technical Infrastructure team. This role combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE, you'll be responsible for ensuring Google Cloud's services maintain reliability and appropriate uptime while continuously improving performance and capacity.
The position offers unique challenges of scale specific to Google Cloud, where you'll apply your expertise in coding, algorithms, complexity analysis, and large-scale system design. You'll be involved in the complete service lifecycle, from design and development to deployment and maintenance, with a strong focus on automation and system optimization.
The role is ideal for someone who thrives in a diverse, intellectually curious environment that promotes problem-solving and openness. You'll work alongside professionals from various backgrounds and perspectives, collaborating in a blame-free environment that encourages big thinking and risk-taking. The team provides strong support and mentorship for continuous learning and growth.
Key aspects of the role include consulting on system design, developing software platforms and frameworks, capacity planning, and launch reviews. You'll also be responsible for monitoring system health, implementing automation for sustainable scaling, and participating in incident response with blameless postmortems.
This position offers the opportunity to work at the heart of Google's technical infrastructure, ensuring millions of users have the best possible experience with Google's services. You'll be part of a team that takes pride in being "engineers' engineers," working on challenging problems at a massive scale while contributing to the evolution of Google's next-generation platforms.