Annapurna Labs, now fully integrated with AWS after its 2015 acquisition, is seeking a Senior Machine Learning Engineer for their Distribute Training team within AWS Neuron. This role focuses on developing and optimizing distributed training solutions for AWS's cloud-scale Machine Learning accelerators, Trainium and Inferentia. The position involves working with cutting-edge ML models including LLMs like GPT and Llama, as well as Stable Diffusion and Vision Transformers.
The role requires expertise in distributed training libraries such as FSDP, Deepspeed, and Nemo, along with strong Python skills. You'll collaborate with cross-functional teams including chip architects and compiler engineers to push the boundaries of ML training performance on AWS custom silicon.
AWS values diverse experiences and maintains an inclusive culture through employee-led affinity groups and ongoing learning experiences. The team emphasizes knowledge-sharing and mentorship, making it an ideal environment for professional growth. Work-life harmony is prioritized, ensuring success both at work and home.
The position offers competitive compensation ranging from $151,300 to $261,500 per year, depending on location and experience, plus additional benefits including equity and sign-on payments. This is an opportunity to work at the forefront of ML infrastructure, developing solutions that enable customers to solve previously unimaginable technical challenges.
As part of Annapurna Labs, you'll be working with the team responsible for critical AWS infrastructure components including AWS Nitro, Graviton, and ML Accelerators. The role combines deep technical expertise with leadership opportunities, making it perfect for experienced engineers passionate about advancing ML technology at scale.