Site Reliability Engineering (SRE) at Google combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE, you'll ensure Google's services maintain reliability and appropriate uptime while monitoring system capacity and performance. The role focuses on optimizing existing systems, building infrastructure, and automation.
The Technical Infrastructure team is responsible for the architecture that powers Google's product portfolio. From developing and maintaining data centers to building next-generation Google platforms, this team makes Google's products possible. The team takes pride in being engineers' engineers and maintains networks for optimal user experience.
Working with Google Cloud Platform's Spanner database, you'll collaborate with various teams to ensure system manageability and efficiency. The role involves project planning and execution for improved reliability, participating in on-call rotations, and managing GCP Spanner allocations.
Google offers a diverse and inclusive environment where intellectual curiosity, problem-solving, and openness are valued. The company brings together people with varied backgrounds and perspectives, encouraging collaboration and innovation in a blame-free environment. You'll have opportunities to work on meaningful projects while receiving support and mentorship for professional growth.
Join a team that's at the forefront of large-scale system design and maintenance, where your expertise in coding, algorithms, and complexity analysis will be put to use in solving unique challenges at Google's scale.