Site Reliability Engineering (SRE) at Google is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As a Software Developer II in the SRE team, you'll be responsible for ensuring Google's services maintain reliability and uptime while focusing on performance and capacity optimization. The role involves creative problem-solving, automation, and system optimization.
SRE at Google follows key principles including limiting operational work, conducting blameless postmortems, and proactively identifying potential outages. The team embraces a culture of diversity, intellectual curiosity, and openness, bringing together people with varied backgrounds and perspectives. You'll work in a blame-free environment that encourages collaboration, big thinking, and risk-taking.
The position offers competitive compensation including base salary, bonus, equity, and comprehensive benefits. You'll have opportunities to work on meaningful projects while receiving support and mentorship for professional growth. The role involves managing project priorities, deadlines, and deliverables, as well as designing, developing, testing, deploying, maintaining, and enhancing software solutions.
Google's SRE team is known for their innovative approaches to operations problems, creating their own engineering solutions and maintaining a fast rate of improvement. You'll be part of a team that's responsible for both internally critical and externally-visible systems, using a broad spectrum of tools and approaches to solve complex problems. This role offers a unique opportunity to impact Google's infrastructure at scale while working with cutting-edge technology and brilliant colleagues.