Site Reliability Engineer

Google is a global technology leader that specializes in internet-related services and products.
Site Reliability
Mid-Level Software Engineer
Contact Company
5,000+ Employees
2+ years of experience
Enterprise SaaS · AI

Description For Site Reliability Engineer

Site Reliability Engineering (SRE) at Google combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE, you'll ensure Google Cloud's services—both internal and external—maintain reliability and appropriate uptime while driving continuous improvement. The role involves managing complex challenges unique to Google Cloud's scale, utilizing expertise in coding, algorithms, complexity analysis, and large-scale system design.

The position emphasizes optimizing existing systems, building infrastructure, and automating processes. Google's SRE culture values diversity, intellectual curiosity, and problem-solving in a blame-free environment. The team brings together individuals with diverse backgrounds and perspectives, encouraging collaboration and innovation while providing support and mentorship for professional growth.

In this role, you'll be responsible for managing project priorities, deadlines, and deliverables while designing, developing, testing, deploying, maintaining, and enhancing software solutions. You'll work on critical projects like Automated Troubleshooting, Better Monitoring, and Service Level Objectives, while collaborating with partner teams to ensure system reliability and optimal performance.

The position offers the opportunity to work with cutting-edge technology at massive scale, contribute to Google's critical infrastructure, and be part of a team that values continuous learning and innovation. You'll play a crucial role in maintaining and improving the reliability of Google's vast network of services while working alongside some of the industry's best engineers.

Last updated 6 days ago

Responsibilities For Site Reliability Engineer

  • Contribute to land projects like Automated Troubleshooting, Better Monitoring and Service Level Objective (SLOs), Podification of services
  • Identify needs across network telemetry services. Propose, build and launch cross-service solutions
  • Motivate improvements in the team's systems, infrastructure around them, and network telemetry ecosystem
  • Engage with partner teams, users to make systems reliable with relatable SLOs
  • Guide technical plans and goals towards creating reliable systems
  • Operate the network telemetry systems of Google production network

Requirements For Site Reliability Engineer

Linux
Kubernetes
  • Bachelor's degree in Computer Science, a related field, or equivalent practical experience
  • 2 years of experience with data structures/algorithms and software development in one or more programming languages
  • Experience in software engineering with knowledge of Google production network
  • Experience with research, propose and launching engineering solutions
  • Ability to collaborate with current and prospective partner teams, product and users
  • Excellent collaboration skills with technical goals
  • Excellent leadership skills

Interested in this job?

Jobs Related To Google Site Reliability Engineer

Software Developer III, Site Reliability Development, Google Cloud

Site Reliability Developer position at Google Cloud, focusing on building and maintaining large-scale distributed systems with competitive compensation and benefits.

Software Developer II, Site Reliability Development, Google Cloud

Site Reliability Developer position at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability and performance optimization.

Software Developer II, Site Reliability Developer, Google Cloud

Site Reliability Engineer role at Google focusing on building and maintaining large-scale distributed systems with competitive compensation and growth opportunities.

Site Reliability Engineer, AlphaNet Edge

Site Reliability Engineer position at Google focusing on maintaining and improving large-scale distributed systems for Google Cloud services.

Site Reliability Engineer, Ads Quality Infrastructure

Site Reliability Engineer position at Google focusing on Ads Quality Infrastructure, requiring 2+ years of experience in distributed systems and software development.