Site Reliability Engineer, CloudWatch Infrastructure

World's most comprehensive and broadly adopted cloud platform, pioneering cloud computing services.
Site Reliability
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
Enterprise SaaS · Cloud

Description For Site Reliability Engineer, CloudWatch Infrastructure

AWS CloudWatch is seeking a Systems Development Engineer to join their Infrastructure team, operating one of the largest time-series data stores monitoring over 13 quadrillion metric observations monthly. This role combines infrastructure management, automation, and large-scale systems operations. CloudWatch is a crucial component of AWS Utility Computing, providing essential monitoring, application tracking, and log analytics services.

The position offers unique challenges in managing tens of thousands of servers and requires expertise in automation, distributed systems, and infrastructure optimization. You'll work with cutting-edge technology, solving problems of massive scale while ensuring operational excellence. The team emphasizes work-life balance and provides an environment conducive to continuous learning and innovation.

As part of AWS, you'll contribute to a service that processes trillions of events monthly and supports some of the world's largest web services. The role combines technical expertise with strategic thinking, requiring both hands-on engineering skills and the ability to drive long-term improvements. AWS values diverse experiences and provides extensive career development opportunities, including mentorship programs and ongoing learning resources.

The position offers exposure to various AWS services, from foundational offerings like S3 and EC2 to newer innovations. You'll work in a collaborative environment that encourages knowledge sharing and professional growth while maintaining a strong focus on work-life harmony. The role is ideal for someone passionate about infrastructure automation, distributed systems, and operating at massive scale.

Last updated 23 days ago

Responsibilities For Site Reliability Engineer, CloudWatch Infrastructure

  • Coordinate with internal teams to uncover infrastructure improvement areas and remove them through automation
  • Contribute toward the forward looking vision for the team
  • Help improve operational excellence by reducing technical debt for the team
  • Manage and operate one of the largest fleets inside of AWS
  • Work on large scale automation projects across various technical areas

Requirements For Site Reliability Engineer, CloudWatch Infrastructure

Python
Linux
Go
Java
  • Experience in automating, deploying, and supporting large-scale infrastructure
  • Experience programming with at least one modern language such as C++, C#, Java, Python, Golang, PowerShell, Ruby
  • Experience with Linux/Unix
  • Experience with CI/CD pipelines build processes

Benefits For Site Reliability Engineer, CloudWatch Infrastructure

Medical Insurance
Dental Insurance
Vision Insurance
  • Flexible work hours
  • Work-life balance focus
  • Career development opportunities
  • Mentorship programs
  • Diverse and inclusive workplace
  • Knowledge-sharing environment

Interested in this job?

Jobs Related To Amazon Site Reliability Engineer, CloudWatch Infrastructure

Site Reliability Engineer, ESC Managed Operations

Senior Site Reliability Engineer role at AWS Dublin, leading European Sovereign Cloud operations and development, requiring 3+ years experience in software development and cloud systems.

Site Reliability Engineer, CloudWatch Infrastructure

Senior SRE role at AWS CloudWatch managing large-scale infrastructure automation and monitoring systems, focusing on operational excellence and infrastructure improvement.

Sr. Site Reliability Engineer, Infrastructure Engineering

Senior Site Reliability Engineer role at Amazon Prime Video, focusing on infrastructure engineering and cloud systems.

Senior Software Developer, Site Reliability Engineering, Google Cloud

Senior SRE role at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability and scalability.

Senior Software Engineer, Site Reliability Engineering, Google Cloud

Senior SRE position at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.