Google's Site Reliability Engineering (SRE) team is seeking a Senior Site Reliability Engineer to join their Cloud Spanner team. This role combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE, you'll be responsible for ensuring Google's services maintain reliability and appropriate uptime while monitoring system capacity and performance.
The role involves working with complex distributed systems at Google's scale, focusing on optimizing existing systems, building infrastructure, and automating processes. You'll be part of a team that values diversity, intellectual curiosity, and problem-solving in a blame-free environment. The position requires strong expertise in coding, algorithms, complexity analysis, and large-scale system design.
Working in the Technical Infrastructure team, you'll be instrumental in maintaining and developing Google's data centers and platforms that power their extensive product portfolio. The role involves collaboration with Software Engineering teams to evolve Cloud Spanner capabilities while ensuring system reliability and efficiency.
This is an excellent opportunity for experienced engineers who are passionate about distributed systems, have a strong background in system administration or networking, and want to work on challenging problems at scale. You'll be part of Google's engineering culture that promotes self-direction, meaningful projects, and continuous learning with strong support and mentorship.
The ideal candidate will bring 5+ years of experience in distributed systems and programming, with a focus on reliability and efficiency. You'll work with technologies like Python, Go, and Linux, while dealing with complex system architecture and infrastructure challenges. Join Google's SRE team to help build and maintain the systems that power one of the world's largest technology companies.