Senior Software Engineer, SRE, Cloud Incident Response

Google is a global technology company that builds and maintains large-scale distributed systems and infrastructure.
Site Reliability
Senior Software Engineer
Contact Company
5,000+ Employees
5+ years of experience
Enterprise SaaS · Cloud

Description For Senior Software Engineer, SRE, Cloud Incident Response

Google's Site Reliability Engineering (SRE) team is seeking a Senior Software Engineer to join their Cloud Incident Response team. This role combines software and systems engineering to build and maintain large-scale, distributed systems for Google Cloud Platform. The position focuses on ensuring service reliability, managing critical incidents, and driving continuous improvement through automation.

As an SRE, you'll be responsible for maintaining the stability and reliability of Google Cloud Platform through incident support and management. You'll work on creating comprehensive training programs and developing end-to-end processes for incident management lifecycles. The role involves building sophisticated tooling systems to improve cloud state visibility and incident detection.

The ideal candidate will have strong experience in distributed systems, software development, and incident management. You'll be part of a team that values intellectual curiosity, problem-solving, and openness. Google's Technical Infrastructure team offers opportunities to work on meaningful projects while providing support and mentorship for professional growth.

This position requires expertise in system design, troubleshooting, and automation. You'll collaborate with various teams across GCP, contribute to pre-launch activities, and drive improvements in system reliability. The role offers the chance to work on unique scaling challenges while making a significant impact on Google Cloud's infrastructure.

Working at Google means joining a diverse team of professionals from various backgrounds and perspectives. The company promotes self-direction and risk-taking in a blame-free environment, making it an ideal place for engineers who want to tackle complex technical challenges while growing their careers.

Last updated 4 days ago

Responsibilities For Senior Software Engineer, SRE, Cloud Incident Response

  • Ensure Google Cloud Platform (GCP) stability and reliability through critical incident support
  • Create training, end-to-end processes for incident management life-cycle
  • Build systems and tooling to support Incident Response team
  • Define and escalate risks in Cloud, reduce Major incident probabilities
  • Ensure the scalability and reliability of systems throughout their life-cycle

Requirements For Senior Software Engineer, SRE, Cloud Incident Response

Python
Go
Java
  • Bachelor's degree in Computer Science, a related field, or equivalent practical experience
  • 5 years of experience with software development in one or more programming languages
  • 5 years of experience with data structures or algorithms
  • 3 years of experience in designing, analyzing, and troubleshooting distributed systems
  • 2 years of experience leading projects and providing technical leadership
  • Experience in SRE or incident management/response environments

Interested in this job?

Jobs Related To Google Senior Software Engineer, SRE, Cloud Incident Response

Senior Software Developer, Site Reliability Engineering, Google Cloud

Senior Site Reliability Engineering role at Google Cloud, focusing on building and maintaining large-scale distributed systems with competitive compensation and comprehensive benefits.

Senior Software Developer, Site Reliability Engineering, Google Cloud

Senior Software Developer role in Google's Site Reliability Engineering team, focusing on building and maintaining large-scale distributed systems with 5+ years of experience required.

Senior Software Engineer, Site Reliability Engineering

Senior SRE position at Google Bengaluru, focusing on enterprise applications reliability and distributed systems at scale.

Senior Software Engineer, Site Reliability Engineering, Google Play

Senior Site Reliability Engineer position at Google Play, focusing on maintaining and optimizing large-scale distributed systems and ensuring service reliability.

Senior Software Engineer, Site Reliability Engineering, Google Cloud

Senior SRE position at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.