Google's Site Reliability Engineering (SRE) team is at the forefront of maintaining and optimizing the company's massive distributed systems. This role combines software and systems engineering to ensure Google Cloud's services maintain reliability and uptime while constantly improving performance. As a Senior SRE, you'll tackle unique scaling challenges while leveraging your expertise in coding, algorithms, and system design.
The position offers an opportunity to work with Google Cloud's infrastructure, where you'll be responsible for the entire service lifecycle - from design and deployment to operation and refinement. You'll be part of a team that values diversity, intellectual curiosity, and problem-solving in a blame-free environment. The role involves both maintaining existing systems and building new infrastructure through automation.
You'll join the Technical Infrastructure team, which is fundamental to Google's product portfolio. The team takes pride in being the "engineers' engineers," focusing on maintaining data centers, developing next-generation platforms, and ensuring optimal network performance. This role requires a strong background in distributed systems, with opportunities to lead projects and provide technical direction.
The ideal candidate will have extensive experience with software development, data structures, and algorithms, combined with a proven track record in designing and troubleshooting large-scale distributed systems. You'll work in a collaborative environment that encourages self-direction while providing support and mentorship for continuous learning and growth.
This position offers the chance to impact millions of users worldwide while working with cutting-edge technology and some of the industry's brightest minds. You'll be instrumental in ensuring Google's services remain reliable, scalable, and efficient, while contributing to the evolution of one of the world's most sophisticated technical infrastructures.