Join AWS's Machine Learning Infrastructure team as a Software Development Engineer at Annapurna Labs, an AWS subsidiary dedicated to revolutionizing ML and HPC capabilities in the cloud. This role focuses on building and maintaining critical infrastructure that monitors and optimizes massive testing workloads at scale.
As a key member of the team, you'll work with cutting-edge technologies like AWS Trainium, Neuron, and Elastic Fabric Adapter (EFA). Your responsibilities include developing CI/CD automation, implementing ML and HPC benchmarks, and creating sophisticated monitoring systems using AWS Managed Grafana and Athena.
The position offers an opportunity to work with TypeScript and CDK for infrastructure as code, manage SLURM-based scheduling systems, and develop innovative solutions for cluster management. You'll be part of a team that's laser-focused on making AWS the most cost-effective platform for AI at scale.
The role combines software engineering excellence with ML infrastructure expertise, requiring strong skills in Python, TypeScript, and Linux systems. You'll work in Seattle, WA, with a competitive salary range of $129,300 to $223,600, depending on experience and location.
This is an ideal position for someone who enjoys working with cutting-edge ML technologies, has a passion for automation and infrastructure, and wants to impact how AI workloads are deployed at scale. You'll be part of an innovative team that's directly influencing the future of machine learning in the cloud.