Annapurna Labs, now fully integrated with AWS after its 2015 acquisition, is seeking a Senior Machine Learning Engineer for their Distributed Training team. This role focuses on AWS Neuron, the complete software stack for AWS Trainium and Inferentia cloud-scale Machine Learning accelerators. The position involves working with cutting-edge ML technologies, including Large Language Models like GPT and Llama, as well as Stable Diffusion and Vision Transformers.
The role demands expertise in distributed training libraries such as FSDP, Deepspeed, and Nemo, with a focus on extending these capabilities for Neuron-based systems. You'll collaborate with cross-functional teams, including chip architects and compiler engineers, to optimize performance on AWS custom silicon platforms.
AWS values diverse experiences and maintains an inclusive culture through employee-led affinity groups and ongoing learning experiences. The team emphasizes knowledge-sharing and mentorship, supporting both professional and personal growth. They offer competitive compensation, including equity and comprehensive benefits, reflecting their commitment to being Earth's Best Employer.
The position offers exposure to groundbreaking technology in cloud computing and machine learning, working on products that directly impact AWS's infrastructure. You'll be part of a team that has delivered significant products like AWS Nitro, Graviton, and ML Accelerators, contributing to solutions that help customers tackle previously unimaginable technical challenges.
This role presents an exceptional opportunity for experienced engineers passionate about machine learning and distributed systems to work at the forefront of cloud technology, while enjoying a supportive, inclusive work environment that values work-life harmony and continuous learning.