Systems Engineer III, Site Reliability Engineering

Google is a global technology leader that specializes in internet-related services and products, including search, cloud computing, software, and hardware.
Site Reliability
Mid-Level Software Engineer
In-Person
5,000+ Employees
2+ years of experience
Enterprise SaaS · Cloud

Description For Systems Engineer III, Site Reliability Engineering

Site Reliability Engineering (SRE) at Google combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE, you'll ensure Google Cloud's services maintain reliability and appropriate uptime while monitoring system capacity and performance. The role focuses on optimizing existing systems, building infrastructure, and automation. You'll tackle unique scaling challenges specific to Google Cloud, applying expertise in coding, algorithms, complexity analysis, and large-scale system design. Google's SRE culture emphasizes diversity, intellectual curiosity, and problem-solving in a blame-free environment. The position offers opportunities to work on meaningful projects with support and mentorship for professional growth. You'll join a team that brings together diverse backgrounds and perspectives, encouraging collaboration and innovation in building and maintaining Google's critical infrastructure.

Last updated 2 days ago

Responsibilities For Systems Engineer III, Site Reliability Engineering

  • Improve the life-cycle of services from inception and design, through deployment, operation, and refinement
  • Manage support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews
  • Provide guidance to other team members on managing availability and performance of mission critical services, on building automation to prevent problem recurrence, and on building automated responses for non-exceptional service conditions
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health, and lead sustainable incident response
  • Scale systems sustainably through mechanisms like automation and evolve systems by driving changes that improve reliability and velocity

Requirements For Systems Engineer III, Site Reliability Engineering

Python
Java
JavaScript
  • Bachelor's degree in Computer Science, a related field, or equivalent practical experience
  • 2 years of experience working with one or more programming languages (e.g., Python, C, C++, Java, JavaScript)
  • 2 years of experience working with administration (e.g., filesystems, inodes, system calls) or networking (e.g., TCP/IP, routing, network topologies and hardware, SDN)
  • Master's degree in Computer Science or Engineering (preferred)
  • Experience in managing and operating global-scale production systems in cloud environments (preferred)
  • Experience architecting, developing, and troubleshooting systems (preferred)
  • Experience designing, analyzing, and troubleshooting distributed systems (preferred)

Interested in this job?

Jobs Related To Google Systems Engineer III, Site Reliability Engineering

Software Engineer, Traffic Trust SRE, DoS Infrastructure

Site Reliability Engineer position at Google focusing on Traffic Trust and DoS Infrastructure, combining software engineering with systems operations to maintain large-scale distributed systems.

Software Engineer III, Site Reliability Engineer

Site Reliability Engineer role at Google focusing on building and maintaining large-scale distributed systems for Google Cloud services.

Databases Site Reliability Engineer

Site Reliability Engineer position at Google focusing on database systems, requiring expertise in distributed systems and infrastructure management.

Software Engineer III, Site Reliability Engineering

Site Reliability Engineer role at Google focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.

Software Engineer III, Site Reliability Engineering, Google Cloud

Site Reliability Engineer role at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.