Annapurna Labs, acquired by AWS in 2015, is at the forefront of silicon and software innovation for cloud computing. This senior role focuses on the AWS Neuron team, specifically working on distributed training for machine learning applications. The position involves developing and optimizing solutions for AWS Trainium and Inferentia, our cloud-scale ML accelerators.
As a Senior Machine Learning Engineer in the Distribute Training team, you'll be responsible for implementing and fine-tuning various ML model families, including Large Language Models like GPT and Llama, as well as Stable Diffusion and Vision Transformers. You'll work closely with chip architects and software engineers to create cutting-edge distributed training solutions.
The role demands expertise in Python and distributed training libraries such as FSDP, Deepspeed, and Nemo. You'll be joining a team that values knowledge-sharing, mentorship, and diverse experiences. AWS provides a supportive environment focused on work-life harmony and career growth.
The position offers competitive compensation ranging from $151,300 to $261,500 based on location and experience, plus additional benefits including equity and comprehensive medical coverage. You'll be part of an inclusive culture that celebrates diversity and supports ongoing learning through employee-led affinity groups and various learning experiences.
This is an opportunity to work on unprecedented technical challenges, developing solutions that help customers innovate and change the world. The role combines deep technical expertise in machine learning with the chance to work on custom hardware acceleration, making it ideal for those passionate about both ML and high-performance computing.