Site Reliability Engineer, CloudWatch Infrastructure

World's most comprehensive and broadly adopted cloud platform, pioneering cloud computing services.
Site Reliability
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
Enterprise SaaS · Cloud

Description For Site Reliability Engineer, CloudWatch Infrastructure

AWS CloudWatch is seeking a Systems Development Engineer to join their Infrastructure team, which operates one of the largest time-series data stores monitoring over 13 quadrillion metric observations monthly. This role combines infrastructure management, automation, and large-scale systems operations. CloudWatch, part of AWS Utility Computing, provides critical monitoring and analytics services that major web services rely on worldwide.

The position offers unique challenges in managing tens of thousands of servers while focusing on continuous automation and infrastructure improvement. You'll work with cutting-edge technology in distributed systems, cloud computing, and large-scale infrastructure automation. The team values work-life balance and fosters an inclusive environment that embraces diverse experiences and backgrounds.

As part of AWS, you'll be working with the world's leading cloud platform, contributing to services that power businesses from startups to Global 500 companies. The role offers extensive opportunities for professional growth through mentorship, knowledge-sharing, and career development resources. You'll be part of a team that values innovation, long-term thinking, and technical excellence while maintaining a strong focus on work-life harmony.

The ideal candidate will have strong experience in infrastructure automation, programming skills in modern languages, and a background in Linux/Unix systems. You'll be instrumental in improving operational excellence, reducing technical debt, and driving automation initiatives across the CloudWatch infrastructure.

Last updated 23 days ago

Responsibilities For Site Reliability Engineer, CloudWatch Infrastructure

  • Coordinate with internal teams to uncover infrastructure improvement areas and remove them through automation
  • Contribute toward the forward looking vision for the team
  • Help improve operational excellence by reducing technical debt for the team
  • Manage and operate one of the largest fleets inside of AWS
  • Coordinate with multiple internal CloudWatch teams to uncover infrastructure issues and automate them away

Requirements For Site Reliability Engineer, CloudWatch Infrastructure

Python
Linux
Go
Java
  • Experience in automating, deploying, and supporting large-scale infrastructure
  • Experience programming with at least one modern language such as C++, C#, Java, Python, Golang, PowerShell, Ruby
  • Experience with Linux/Unix
  • Experience with CI/CD pipelines build processes

Benefits For Site Reliability Engineer, CloudWatch Infrastructure

Medical Insurance
Vision Insurance
Dental Insurance
Parental Leave
  • Flexible work hours
  • Work-life balance
  • Mentorship and career growth opportunities
  • Employee-led affinity groups
  • Inclusive team culture
  • Knowledge-sharing resources

Interested in this job?

Jobs Related To Amazon Site Reliability Engineer, CloudWatch Infrastructure

Site Reliability Engineer, ESC Managed Operations

Senior Site Reliability Engineer role at AWS Dublin, leading European Sovereign Cloud operations and development, requiring 3+ years experience in software development and cloud systems.

Site Reliability Engineer, CloudWatch Infrastructure

Senior SRE role at AWS CloudWatch managing large-scale infrastructure and automation for one of the world's largest monitoring services.

Sr. Site Reliability Engineer, Infrastructure Engineering

Senior Site Reliability Engineer role at Amazon Prime Video, focusing on infrastructure engineering and cloud systems.

Senior Site Reliability Engineer

Senior Site Reliability Engineer role at Zscaler, focusing on cloud infrastructure, automation, and maintaining high-availability systems across AWS, Azure, and GCP.

Senior Site Reliability Engineer

Senior SRE position at Blacklane focusing on system reliability, observability, and mentoring, offering hybrid work and equity in a global mobility company.