Site Reliability Engineer

Google is a global technology company that specializes in internet-related services and products, including search, cloud computing, software, and hardware.
Site Reliability
Mid-Level Software Engineer
Contact Company
5,000+ Employees
2+ years of experience
AI · Enterprise SaaS · Cloud

Description For Site Reliability Engineer

Site Reliability Engineering (SRE) at Google Cloud combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE, you'll ensure that Google Cloud's services—both internally critical and externally-visible systems—have reliability and uptime appropriate to customer needs, with a fast rate of improvement. You'll monitor systems capacity and performance, focusing on optimizing existing systems, building infrastructure, and eliminating work through automation.

The role offers unique challenges of scale specific to Google Cloud, allowing you to apply your expertise in coding, algorithms, complexity analysis, and large-scale system design. SRE culture values diversity, intellectual curiosity, problem-solving, and openness. The organization brings together people with varied backgrounds and perspectives, encouraging collaboration, big thinking, and risk-taking in a blame-free environment.

Key responsibilities include:

  1. Maintaining engineering excellence on Telecommunication Infrastructure (TI) and Cloud turnup.
  2. Reviewing code and providing feedback to ensure best practices.
  3. Contributing to documentation and educational content.
  4. Triaging and resolving product or system issues.
  5. Leading design reviews to decide among available technologies.

This role requires a Bachelor's degree in Computer Science or related field (or equivalent experience), and at least 2 years of experience with data structures/algorithms and software development in languages like Java, Python, Go, C, or C++. Preferred qualifications include experience with distributed systems, storage, or networking, and strong problem-solving skills.

Google is committed to diversity, equal opportunity, and creating a culture of belonging. They offer accommodations for applicants with needs and require English proficiency for efficient global collaboration.

Last updated 2 days ago

Responsibilities For Site Reliability Engineer

  • Maintain engineering excellence on both Telecommunication Infrastructure (TI) and Cloud turnup
  • Review code developed by other engineers and provide feedback to ensure best practices
  • Contribute to existing documentation or educational content and adapt content based on product/program updates and user feedback
  • Triage product or system issues and debug/track/resolve by analyzing the sources of issues and the impact on hardware, network, or service operations and quality
  • Lead design reviews with peers and stakeholders to decide among available technologies

Requirements For Site Reliability Engineer

Java
Python
Go
  • Bachelor's degree in Computer Science, a related field, or equivalent practical experience
  • 2 years of experience with data structures/algorithms and software development in one or more programming languages (Java, Python, Go, C, C++)
  • Experience working in computing, distributed systems, storage or networking
  • Expertise in designing, analyzing, and troubleshooting large-scale distributed systems
  • Ability to debug, optimize code, and to automate routine tasks
  • Excellent communication and problem-solving skills

Interested in this job?

Jobs Related To Google Site Reliability Engineer

Site Reliability Engineer II

Microsoft seeks a Site Reliability Engineer II for their Commerce and Ecosystems team to manage and automate large-scale platforms.

Software Developer II, Site Reliability Development, Google Cloud

Google Cloud seeks a Software Developer II for Site Reliability Development to build and maintain large-scale, fault-tolerant systems.

Software Developer II, Site Reliability Developing, Google Cloud

Google Cloud seeks a Software Developer II for Site Reliability Engineering to build and maintain large-scale, fault-tolerant systems.

Site Reliability Engineering, Transformative Compute Site Reliability Engineering

Google is seeking a Mid-Level Site Reliability Engineer to build and maintain large-scale distributed systems for Google Cloud services.

Site Reliability Engineering, Transformative Compute Site Reliability Engineering

Join Google as a Site Reliability Engineer to build and maintain large-scale, fault-tolerant systems for Google Cloud services.