Site Reliability Engineer, Managed Operations

Amazon

World's most comprehensive and broadly adopted cloud platform, pioneering cloud computing services.

Berlin, Germany

Site Reliability

Senior Software Engineer

In-Person

5,000+ Employees

5+ years of experience

Enterprise SaaS · Cloud

This job posting may no longer be active. You may be interested in these related jobs instead:

Site Reliability Engineer, ESC Managed Operations

Amazon

Senior Site Reliability Engineer role at AWS Dublin, leading European Sovereign Cloud operations and development, requiring 3+ years experience in software development and cloud systems.

Sr. Site Reliability Engineer, Infrastructure Engineering

Amazon

Senior Site Reliability Engineer role at Amazon Prime Video, focusing on infrastructure engineering and cloud systems.

Senior Site Reliability Engineer

Oracle

Senior Site Reliability Engineer position at Oracle, focusing on cloud infrastructure and systems reliability with 3-5+ years of experience required.

Site Reliability Engineer

AION

Senior Site Reliability Engineer role at AION, building and maintaining infrastructure for a decentralized AI cloud platform with focus on automation and reliability.

Senior Software Developer, Site Reliability Engineering, Google Cloud

Google

Senior Software Developer role in Site Reliability Engineering at Google Cloud, focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.

Description For Site Reliability Engineer, Managed Operations

AWS is launching its first European Sovereign Cloud (ESC), a groundbreaking initiative in Utility Computing. As a Site Reliability Engineer in the AWS Managed Operations team, you'll play a crucial role in building and leading operations for high-availability AWS services like EC2, S3, Dynamo, Lambda, and Bedrock, specifically for EU customers.

The role splits evenly between operating production systems and implementing long-term improvements. You'll be part of AWS Utility Computing (UC), which provides foundational services and continuous product innovations. Your responsibilities include overseeing the ESC launch in 2025, collaborating with global teams, and ensuring optimal service performance.

Working at AWS means joining the world's leading cloud platform provider, where innovation is constant. The company values diverse experiences and fosters an inclusive culture through employee-led affinity groups and ongoing learning opportunities. You'll benefit from extensive mentorship, career growth resources, and a strong work-life harmony philosophy.

The ideal candidate brings experience with modern programming languages (Java, TypeScript, Python, Ruby), Linux systems, and automation. You'll work in Berlin, Germany, with relocation support available within the EU. This role offers the unique opportunity to shape the future of cloud computing in Europe while working with cutting-edge technologies and world-class teams.

Join AWS to be part of a transformative project that combines technical excellence with customer obsession, all while maintaining high standards for security and reliability in cloud computing.

Last updated 3 months ago

Responsibilities For Site Reliability Engineer, Managed Operations

Oversee the launch of the European Sovereign Cloud (ESC) in 2025
Operate production systems (50% of time)
Make long-term improvements to reliability, availability, and performance (50% of time)
Root cause analysis of deployment failures
Execute highly sensitive time-critical changes to production
Participate in design discussions and code reviews
Participate in on-call rotations
Collaborate with global AWS teams
Ensure high-availability experience for EU customers

Requirements For Site Reliability Engineer, Managed Operations

Python

Java

TypeScript

Ruby

Linux

Experience in at least one modern programming language such as Java, Typescript, Python, or Ruby
Familiarity with Linux, using the command line and basic administration
Experience with computer networking fundamentals
Experience with scripting and automation
Fluency in written and spoken English
Legal right to work in Germany

Benefits For Site Reliability Engineer, Managed Operations

Relocation Benefits

Visa Sponsorship

Relocation support within EU
Mentorship and career growth opportunities
Work-life harmony
Employee-led affinity groups
Inclusive team culture
Continuous learning opportunities

Amazon

World's most comprehensive and broadly adopted cloud platform, pioneering cloud computing services.

Berlin, Germany

Site Reliability

Senior Software Engineer

In-Person

5,000+ Employees

5+ years of experience

Enterprise SaaS · Cloud

Amazon

Given an array representing server processing powers, merge consecutive servers if the next server's power is greater than the current. What is the final array of server powers after all possible merges are done?

Data Structures & AlgorithmsHard

You are given an array of servers represented by their processing power. You want to join consecutive servers together to form larger, more powerful servers. You can only join server i with server i+1 if the processing power of server i+1 is strictly greater than the processing power of server i. This joining process can cascade, meaning that if you join i and i+1, and the new combined processing power is less than i+2, you can then join the combined server with i+2, and so on. The goal is to maximize the number of these larger servers. For example: Consider the array [1, 2, 3, 4, 5]. You can join 1 and 2 to form 3. Then you can join 3 and 3 to form 6. Then you can join 6 and 4 to form 10. Finally, you can join 10 and 5 to form 15, resulting in one server with processing power 15. Consider the array [5, 4, 3, 2, 1]. You cannot join any servers because the processing power is always decreasing. Consider the array [1, 3, 2, 4, 1, 5]. You can join 1 and 3 to get 4. You cannot join 4 and 2. You can join 2 and 4 to get 6. You cannot join 6 and 1. You can join 1 and 5 to get 6. The final servers will be [4, 6, 6]. Write a function that takes an array of integers representing server processing powers and returns the final array of server processing powers after performing all possible joins.

Arrays

Greedy Algorithms

Amazon

Design a system to clean and transform inconsistent customer data from various sources into a consistent format for analysis, addressing data cleaning, transformation, scalability, and error handling. Provide code examples for data cleaning and transformation steps.

System DesignMedium

Let's explore a scenario involving data transformation. Imagine you're receiving a stream of customer data from various sources. This data includes customer IDs, names, email addresses, and purchase histories. However, the data is inconsistent: some sources use different formats for dates, some have missing fields, and others use abbreviations for states. Your task is to design a robust and efficient system to clean and transform this data into a consistent format suitable for analysis. Specifically: Data Cleaning: How would you handle missing values, inconsistent date formats (e.g., MM/DD/YYYY vs. YYYY-MM-DD), and variations in state abbreviations (e.g., CA vs. California)? Provide code examples (Python is preferred) demonstrating how you would address these issues. Data Transformation: How would you transform the data to ensure consistency? For example, you might need to convert all dates to a standard format, expand state abbreviations to their full names, and ensure all customer IDs are in a uniform format. Scalability: How would you design the system to handle a large volume of data (e.g., millions of records per day)? Consider the technologies and architectures you would use to ensure scalability and performance. Think about potential bottlenecks and how to address them. Error Handling: Describe how you would implement error handling and logging to identify and address data quality issues. What metrics would you track to monitor the quality of the transformed data? For instance, suppose you receive the following data snippets: Source 1: {customer_id: 123, name: Alice, email: alice@example.com, purchase_date: 01/01/2023, state: CA} Source 2: {CustomerID: 456, Name: Bob, Email: bob@example.com, PurchaseDate: 2023-01-01, State: California} How would your system handle these variations and transform them into a unified format like this: {customer_id: 123, name: Alice, email: alice@example.com, purchase_date: 2023-01-01, state: California}

Database Problems

Arrays

Strings

Amazon

Describe a time you disagreed with a teammate

Behavioral

Describe a time when you and a teammate had a fundamental disagreement about a project. What was the disagreement about? How did you approach resolving the disagreement? What was the outcome, and what did you learn from the experience?

Interested in this job?