Site Reliability Engineering (SRE) at Google combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE, you'll ensure Google Cloud's services maintain reliability and appropriate uptime while focusing on system capacity and performance optimization. The role involves managing complex challenges unique to Google Cloud's scale, utilizing expertise in coding, algorithms, complexity analysis, and large-scale system design.
The position is part of Google's Technical Infrastructure team, responsible for the architecture behind all user-facing services. You'll work on developing and maintaining data centers, building next-generation Google platforms, and ensuring networks run optimally for the best user experience.
The role offers opportunities to work in a diverse, intellectually curious environment that values problem-solving and openness. Google promotes self-direction on meaningful projects while providing support and mentorship for growth. You'll manage project priorities, deadlines, and deliverables while designing, developing, testing, deploying, and enhancing software solutions.
Key aspects include code review, documentation maintenance, system troubleshooting, and leading design reviews. The position requires strong technical skills, particularly in distributed systems and software development, combined with the ability to collaborate effectively in a team environment. Join a culture that brings together people with diverse backgrounds and perspectives, encouraging collaboration and innovation in a blame-free environment.