Systems Engineer III, Site Reliability Engineering

Google is a global technology leader that specializes in internet-related services and products, including search, cloud computing, software, and hardware.
Site Reliability
Mid-Level Software Engineer
Contact Company
5,000+ Employees
2+ years of experience
Enterprise SaaS · Cloud

Description For Systems Engineer III, Site Reliability Engineering

Site Reliability Engineering (SRE) at Google combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE, you'll ensure Google Cloud's services maintain reliability and appropriate uptime while focusing on system optimization, infrastructure development, and automation. The role involves managing complex challenges unique to Google Cloud's scale, utilizing expertise in coding, algorithms, complexity analysis, and large-scale system design. Google's SRE team emphasizes diversity, intellectual curiosity, and problem-solving in a blame-free environment. The position offers opportunities to work on meaningful projects with support and mentorship for professional growth. You'll be part of a team that manages critical infrastructure, develops automation solutions, and ensures the smooth operation of Google's vast service network. The role combines technical expertise with system design, requiring both hands-on engineering skills and strategic thinking to maintain and improve Google's global infrastructure.

Last updated 22 days ago

Responsibilities For Systems Engineer III, Site Reliability Engineering

  • Improve the whole life-cycle of services from inception and design, through deployment, operation, and refinement
  • Manage support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews
  • Provide guidance to other team members on managing availability and performance of mission critical services, on building automation to prevent problem recurrence, and on building automated responses for non-exceptional service conditions
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health, and lead sustainable incident response
  • Scale systems sustainably through mechanisms like automation and evolve systems by driving changes that improve reliability and velocity

Requirements For Systems Engineer III, Site Reliability Engineering

Python
Java
JavaScript
  • Bachelor's degree in Computer Science, a related field, or equivalent practical experience
  • 2 years and above experience in one or more programming languages (e.g., Python, C, C++, Java, JavaScript)
  • 2 years of experience working with administration (e.g., filesystems, inodes, system calls) or networking (e.g., TCP/IP, routing, network topologies and hardware, SDN)
  • Master's degree in Computer Science or Engineering (preferred)
  • Experience in managing and operating global-scale production systems in cloud environments (preferred)
  • Experience architecting, developing, and troubleshooting systems (preferred)
  • Experience designing, analyzing, and troubleshooting distributed systems (preferred)

Interested in this job?

Jobs Related To Google Systems Engineer III, Site Reliability Engineering

Software Engineer, Traffic Trust SRE, DoS Infrastructure

Site Reliability Engineer position at Google focusing on Traffic Trust and DoS Infrastructure, combining software engineering with systems operations to maintain large-scale distributed systems.

Software Engineer III, Site Reliability Engineer

Site Reliability Engineer role at Google focusing on building and maintaining large-scale distributed systems for Google Cloud services.

Databases Site Reliability Engineer

Site Reliability Engineer position at Google focusing on database systems, requiring expertise in distributed systems and infrastructure management.

Software Engineer III, Site Reliability Engineering

Site Reliability Engineer role at Google focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.

Software Engineer III, Site Reliability Engineering, Google Cloud

Site Reliability Engineer role at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.