Site Reliability Engineering (SRE) at Google combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE, you'll ensure Google Cloud's services maintain reliability and appropriate uptime while monitoring system capacity and performance. The role involves optimizing existing systems, building infrastructure, and automating processes.
You'll tackle unique scaling challenges specific to Google Cloud, applying expertise in coding, algorithms, complexity analysis, and large-scale system design. The team values diversity, intellectual curiosity, and problem-solving in a blame-free environment. You'll work alongside professionals from varied backgrounds, collaborating on meaningful projects while receiving support and mentorship for growth.
The Technical Infrastructure team is fundamental to Google's product portfolio, developing and maintaining data centers and building next-generation platforms. The team takes pride in being the engineers' engineers, ensuring networks run optimally for the best user experience. You'll manage project priorities and deadlines while designing, developing, and maintaining software solutions.
The role offers opportunities to work with cutting-edge distributed systems, contribute to Google's critical infrastructure, and impact billions of users. You'll join a culture that promotes self-direction while providing the support needed to learn and grow in your career.