Site Reliability Engineer

Google

Google is a global technology company that builds and runs large-scale, massively distributed systems.

Dublin, Ireland

Site Reliability

Mid-Level Software Engineer

In-Person

5,000+ Employees

2+ years of experience

Enterprise SaaS · Cloud

Description For Site Reliability Engineer

Site Reliability Engineering (SRE) at Google combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE, you'll ensure Google Cloud's services—both internally critical and externally-visible systems—maintain reliability and appropriate uptime for customer needs while driving continuous improvement. The role involves managing complex challenges of scale unique to Google Cloud, utilizing expertise in coding, algorithms, complexity analysis, and large-scale system design.

The position emphasizes optimizing existing systems, building infrastructure, and automating processes. Google's SRE culture values diversity, intellectual curiosity, problem-solving, and openness. The organization brings together people with diverse backgrounds and perspectives, encouraging collaboration and risk-taking in a blame-free environment.

You'll work on meaningful projects with self-direction while receiving support and mentorship for growth. Key responsibilities include managing project priorities, deadlines, and deliverables, as well as designing, developing, testing, deploying, maintaining, and enhancing software solutions. The role involves working with network telemetry services, implementing automated troubleshooting, improving monitoring systems, and ensuring service reliability through well-defined SLOs.

This is an excellent opportunity for engineers passionate about large-scale systems, automation, and reliability. You'll collaborate with partner teams, shape technical plans, and directly impact the reliability of Google's production network while working in a supportive, growth-oriented environment.

Last updated a day ago

Responsibilities For Site Reliability Engineer

Contribute to land projects like Automated Troubleshooting, Better Monitoring and Service Level Objective (SLOs), Podification of services
Identify needs across network telemetry services. Propose, build and launch cross-service solutions
Motivate improvements in the team's systems, infrastructure around them, and network telemetry ecosystem
Engage with partner teams, users to make systems reliable with relatable SLOs
Guide technical plans and goals towards creating reliable systems
Operate the network telemetry systems of Google production network

Requirements For Site Reliability Engineer

Python

Java

Kubernetes

Bachelor's degree in Computer Science, a related field, or equivalent practical experience
2 years of experience with data structures/algorithms and software development in one or more programming languages
Experience in software engineering with knowledge of Google production network
Experience with research, propose and launching engineering solutions
Ability to collaborate with current and prospective partner teams, product and users
Excellent collaboration skills with technical goals
Excellent leadership skills

Benefits For Site Reliability Engineer

Medical Insurance

Parental Leave

Visa Sponsorship

Equal opportunity employer
Accommodation for special needs
Global work environment

Google

Google is a global technology company that builds and runs large-scale, massively distributed systems.

Dublin, Ireland

Site Reliability

Mid-Level Software Engineer

In-Person

5,000+ Employees

2+ years of experience

Enterprise SaaS · Cloud

Google

How do you find all common parents for a set of nodes on a tree?

Data Structures & AlgorithmsMedium

Given a tree data structure, and a set of nodes within that tree, how do you find all of their common parents? For example, consider a tree where node A is the root, and it has children B and C. Node B has children D and E, and node C has a child F. If the input set of nodes is {D, E, F}, then the common parents would be {B, C, A}. If the input set of nodes is {D, E}, then the common parents would be {B, A}. Design an algorithm to efficiently find these common parents, considering potential optimizations for different tree structures and sizes.

Trees

Recursion

Graphs

Google

How would you assign ACLs to users or groups?

System DesignMedium

Let's discuss Access Control Lists (ACLs). Imagine you're designing a system where you need to control access to various resources. How would you approach assigning ACLs to users and groups? Be specific. For example, consider a scenario with files, directories, and applications. How would you define permissions (read, write, execute, delete) and associate them with individual users (like 'john.doe') or groups (like 'developers' or 'administrators')? What different strategies would you evaluate for managing ACLs, and what are the tradeoffs between them in terms of security, performance, and ease of administration? For example, would you use an identity-based approach, a role-based approach, or a combination of both? Consider also how you would handle inheritance of ACLs in a hierarchical structure, such as a file system. How would you prevent privilege escalation and ensure that users only have the access they need? Finally, how would you audit ACL changes and monitor access attempts to detect potential security breaches?

Graphs

Dynamic Programming

Google

Tell me about a time you had to sign a non-disclosure agreement (NDA). What were the key terms, and how did you ensure compliance?

Behavioral

Tell me about a time you had to sign a non-disclosure agreement (NDA). What were the key terms, and how did you ensure compliance while working on the project? Consider these points in your answer: Context: Briefly describe the project and the purpose of the NDA. Who were the parties involved, and what information was being protected? Key Terms: What were the most important clauses or restrictions outlined in the NDA? This could include limitations on sharing information, reverse engineering, or using the protected information for purposes outside the scope of the agreement. Compliance Measures: What specific steps did you take to ensure you and your team complied with the NDA? Did you implement any specific procedures, such as data encryption, access controls, or regular training? Challenges: Did you face any challenges in adhering to the NDA? If so, how did you overcome them? Did you need to seek clarification on any terms or consult with legal counsel? Outcome: What was the result of the project, and how did the NDA contribute to its success? Did the NDA effectively protect the confidential information, and were there any lessons learned from the experience? For example, perhaps you worked on a project involving a new algorithm for a company. The NDA might have restricted you from sharing the algorithm with anyone outside the project team, reverse engineering it, or using it for any other projects. You might have implemented access controls to limit who could access the algorithm's source code and conducted regular training sessions to remind the team of their obligations under the NDA. You should talk about how you proactively avoided violations and made sure your actions were fully within the legal scope of the agreement.

Interested in this job?

Jobs Related To Google Site Reliability Engineer

Software Developer III, Site Reliability Development, Google Cloud

Google

Site Reliability Developer role at Google Cloud focusing on building and maintaining large-scale distributed systems with competitive compensation and growth opportunities.

Technical Program Manager, Site Reliability Engineering

Google

Technical Program Manager position at Google's SRE team, leading infrastructure and service delivery projects with focus on operational excellence and cross-functional collaboration.

Program Manager, Platforms and Devices Site Reliability Engineering

Google

Lead complex technical programs for Google's Platforms and Devices SRE team, managing cross-functional projects and driving organizational efficiency.

Software Engineer III, Shopping Build Site Reliability Engineer

Google

Site Reliability Engineer role at Google focusing on building and maintaining large-scale distributed systems for Google Cloud services.

Site Reliability Engineer, Ads Quality Infrastructure

Google

Site Reliability Engineer position at Google focusing on Ads Quality Infrastructure, requiring expertise in distributed systems and software development with 2+ years of experience.