Site Reliability Engineer

Google is a global technology company that builds and runs large-scale, massively distributed systems.
Site Reliability
Mid-Level Software Engineer
In-Person
5,000+ Employees
2+ years of experience
Enterprise SaaS · Cloud

Description For Site Reliability Engineer

Site Reliability Engineering (SRE) at Google combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE, you'll ensure Google Cloud's services—both internally critical and externally-visible systems—maintain reliability and appropriate uptime for customer needs while driving continuous improvement. The role involves managing complex challenges of scale unique to Google Cloud, utilizing expertise in coding, algorithms, complexity analysis, and large-scale system design.

The position emphasizes optimizing existing systems, building infrastructure, and automating processes. Google's SRE culture values diversity, intellectual curiosity, problem-solving, and openness. The organization brings together people with diverse backgrounds and perspectives, encouraging collaboration and risk-taking in a blame-free environment.

You'll work on meaningful projects with self-direction while receiving support and mentorship for growth. Key responsibilities include managing project priorities, deadlines, and deliverables, as well as designing, developing, testing, deploying, maintaining, and enhancing software solutions. The role involves working with network telemetry services, implementing automated troubleshooting, improving monitoring systems, and ensuring service reliability through well-defined SLOs.

This is an excellent opportunity for engineers passionate about large-scale systems, automation, and reliability. You'll collaborate with partner teams, shape technical plans, and directly impact the reliability of Google's production network while working in a supportive, growth-oriented environment.

Last updated a day ago

Responsibilities For Site Reliability Engineer

  • Contribute to land projects like Automated Troubleshooting, Better Monitoring and Service Level Objective (SLOs), Podification of services
  • Identify needs across network telemetry services. Propose, build and launch cross-service solutions
  • Motivate improvements in the team's systems, infrastructure around them, and network telemetry ecosystem
  • Engage with partner teams, users to make systems reliable with relatable SLOs
  • Guide technical plans and goals towards creating reliable systems
  • Operate the network telemetry systems of Google production network

Requirements For Site Reliability Engineer

Python
Go
Java
Kubernetes
  • Bachelor's degree in Computer Science, a related field, or equivalent practical experience
  • 2 years of experience with data structures/algorithms and software development in one or more programming languages
  • Experience in software engineering with knowledge of Google production network
  • Experience with research, propose and launching engineering solutions
  • Ability to collaborate with current and prospective partner teams, product and users
  • Excellent collaboration skills with technical goals
  • Excellent leadership skills

Benefits For Site Reliability Engineer

Medical Insurance
Parental Leave
Visa Sponsorship
  • Equal opportunity employer
  • Accommodation for special needs
  • Global work environment

Interested in this job?

Jobs Related To Google Site Reliability Engineer

Software Developer III, Site Reliability Development, Google Cloud

Site Reliability Developer role at Google Cloud focusing on building and maintaining large-scale distributed systems with competitive compensation and growth opportunities.

Technical Program Manager, Site Reliability Engineering

Technical Program Manager position at Google's SRE team, leading infrastructure and service delivery projects with focus on operational excellence and cross-functional collaboration.

Program Manager, Platforms and Devices Site Reliability Engineering

Lead complex technical programs for Google's Platforms and Devices SRE team, managing cross-functional projects and driving organizational efficiency.

Software Engineer III, Shopping Build Site Reliability Engineer

Site Reliability Engineer role at Google focusing on building and maintaining large-scale distributed systems for Google Cloud services.

Site Reliability Engineer, Ads Quality Infrastructure

Site Reliability Engineer position at Google focusing on Ads Quality Infrastructure, requiring expertise in distributed systems and software development with 2+ years of experience.