Senior Systems Engineer, Site Reliability Engineering

Google is a global technology company that builds and maintains large-scale distributed systems and infrastructure.
Site Reliability
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
Enterprise SaaS · Cloud

Description For Senior Systems Engineer, Site Reliability Engineering

Google's Site Reliability Engineering (SRE) team is seeking a Senior Systems Engineer to join their technical infrastructure team. This role combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE, you'll be responsible for ensuring Google Cloud's services maintain reliability and appropriate uptime while monitoring system capacity and performance.

The position requires extensive experience in distributed systems, with a focus on designing, analyzing, and troubleshooting large-scale infrastructure. You'll work on optimizing existing systems, building infrastructure, and creating automation solutions to eliminate manual work. The role offers unique challenges of scale specific to Google Cloud, where you'll apply your expertise in coding, algorithms, complexity analysis, and large-scale system design.

SRE's culture emphasizes diversity, intellectual curiosity, and problem-solving in a blame-free environment. The team brings together individuals with varied backgrounds and perspectives, encouraging collaboration and innovative thinking. You'll have the opportunity to work on meaningful projects with significant impact, while receiving support and mentorship for continuous learning and growth.

Key responsibilities include improving service lifecycles, providing technical guidance to team members, maintaining service reliability through monitoring and metrics, leading incident responses, and driving automation initiatives. You'll also be involved in system design consulting, capacity planning, and launch reviews for new services.

The ideal candidate will have at least 5 years of programming experience, strong knowledge of Unix/Linux systems, and proven experience with distributed systems. Leadership experience and excellent communication skills are essential, as you'll be guiding team members and collaborating across various technical teams.

Join Google's Technical Infrastructure team to help build and maintain the architecture that powers Google's vast product portfolio. You'll be part of a team that takes pride in being the engineers' engineers, focusing on creating robust, scalable solutions that ensure the best possible user experience.

Last updated a day ago

Responsibilities For Senior Systems Engineer, Site Reliability Engineering

  • Improve the whole lifecycle of services from inception and design, through deployment, operation, and refinement
  • Provide guidance to other team members on managing availability and performance of mission critical services
  • Maintain services by measuring and monitoring availability, latency, and overall system health
  • Lead sustainable incident response and blameless postmortems
  • Scale systems sustainably through automation
  • Manage support services before they go live
  • System design consulting
  • Developing software platforms and frameworks
  • Capacity planning and launch reviews

Requirements For Senior Systems Engineer, Site Reliability Engineering

Linux
  • Bachelor's degree in Computer Science, a related field, or equivalent practical experience
  • 5 years of experience with programming in one or more programming languages
  • 3 years of experience designing, analyzing, and troubleshooting distributed systems
  • Experience with Unix/Linux systems internals and administration
  • 2 years of experience leading projects
  • Experience in computing, distributed systems, storage, or networking
  • Expertise in designing, analyzing, and troubleshooting large-scale distributed systems
  • Ability to debug, optimize code, and to automate routine tasks
  • Systematic problem-solving approach
  • Effective verbal and written communication skills

Interested in this job?

Jobs Related To Google Senior Systems Engineer, Site Reliability Engineering

Senior Site Reliability Engineer

Senior SRE position at Apple working on satellite communications infrastructure, building and maintaining critical systems for emergency services.

Site Reliability Engineer- SRE

Senior Site Reliability Engineer position at Apple, focusing on platform engineering and cloud infrastructure for hardware engineering tools and data analytics.

Senior Production SRE Engineer - Storage

Senior Production SRE Engineer role at NVIDIA focusing on storage systems, requiring 5+ years experience in managing large-scale infrastructure and strong programming skills.

Senior Site Reliability Engineer

Senior Site Reliability Engineer role at Truecaller, focusing on infrastructure management and system reliability for a global communication platform.

Senior Software Developer, Site Reliability Engineering, Google Cloud

Senior SRE role at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability and scalability.