Site Reliability Engineering (SRE) at Google is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE, you'll ensure Google's services have appropriate reliability and uptime while maintaining performance and capacity. The role focuses on optimizing existing systems, building infrastructure, and automating operations problems.
SRE at Google emphasizes limiting operational work, conducting blameless postmortems, and proactively identifying potential outages. The culture promotes diversity, intellectual curiosity, and problem-solving in a blame-free environment. You'll work with a diverse team of professionals from various backgrounds and perspectives, collaborating on meaningful projects while receiving support and mentorship for growth.
The position offers a competitive compensation package including base salary, bonus, equity, and benefits. You'll be responsible for managing project priorities, deadlines, and deliverables while designing, developing, testing, deploying, maintaining, and enhancing software solutions. This role provides an opportunity to work on critical systems that impact millions of users while being part of a team that values continuous learning and improvement.
As an SRE, you'll use a broad spectrum of tools and approaches to solve complex problems, working on both internally critical and externally-visible systems. The role requires creative engineering solutions to operations problems, with a focus on automation and system optimization. You'll be part of a team that's responsible for understanding how systems relate to each other and maintaining the big picture of Google's infrastructure.