Site Reliability Engineer

Google is a global technology company that builds and runs large-scale, massively distributed systems and services.
Site Reliability
Mid-Level Software Engineer
In-Person
5,000+ Employees
2+ years of experience
Enterprise SaaS · Cloud

Description For Site Reliability Engineer

Google's Site Reliability Engineering (SRE) team is at the forefront of maintaining and optimizing the company's massive distributed systems. This role combines software and systems engineering to ensure Google Cloud's services maintain optimal reliability and performance. As an SRE, you'll tackle unique scaling challenges while leveraging your expertise in coding, algorithms, and large-scale system design.

The position offers an opportunity to work on critical infrastructure that powers both internal and external-facing systems. You'll be involved in performance optimization, automation, and capacity planning. The role demands a strong foundation in software development and systems engineering, with a focus on building fault-tolerant systems.

Google's SRE team cultivates a culture of diversity, intellectual curiosity, and problem-solving. The environment promotes collaboration among professionals from various backgrounds and experiences. You'll work in a blame-free setting that encourages innovation and risk-taking, with ample support for professional growth and mentorship.

Key aspects of the role include managing project priorities, developing software solutions, and ensuring system reliability through monitoring and optimization. You'll work with cutting-edge technologies and contribute to projects that directly impact Google's global infrastructure. The position offers the perfect blend of technical challenges and collaborative opportunities, making it ideal for engineers passionate about large-scale systems and reliability engineering.

Last updated 4 hours ago

Responsibilities For Site Reliability Engineer

  • Contribute to land projects like Automated Troubleshooting, Better Monitoring and Service Level Objective (SLOs), Podification of services
  • Identify needs across network telemetry services. Propose, build and launch cross-service solutions
  • Motivate improvements in the team's systems, infrastructure around them, and network telemetry ecosystem
  • Engage with partner teams, users to make systems reliable with relatable SLOs
  • Guide technical plans and goals towards creating reliable systems
  • Operate the network telemetry systems of Google production network

Requirements For Site Reliability Engineer

Python
Java
Go
Kubernetes
  • Bachelor's degree in Computer Science, a related field, or equivalent practical experience
  • 2 years of experience with data structures/algorithms and software development in one or more programming languages
  • Experience in software engineering with knowledge of Google production network
  • Experience with research, propose and launching engineering solutions
  • Ability to collaborate with current and prospective partner teams, product and users
  • Excellent collaboration skills with technical goals
  • Excellent leadership skills

Interested in this job?

Jobs Related To Google Site Reliability Engineer

Software Engineer II, Site Reliability Engineering

Software Engineer II position in Google's Site Reliability Engineering team, focusing on maintaining and optimizing large-scale distributed systems for Google Cloud services.

Software Developer II, Site Reliability Developer, Google Cloud

Google Cloud SRE position focusing on building and maintaining large-scale distributed systems with competitive compensation and comprehensive benefits.

Software Developer III, Site Reliability Development, Google Cloud

Site Reliability Developer position at Google Cloud focusing on building and maintaining large-scale distributed systems with competitive compensation and benefits.

Software Engineer, Site Reliability Engineering

Site Reliability Engineer position at Google focusing on maintaining and optimizing large-scale distributed systems for Google Cloud services.

Software Developer II, Site Reliability Developer, Google Cloud

Software Developer II position at Google Cloud focusing on Site Reliability Engineering, building and maintaining large-scale distributed systems with competitive compensation and benefits.