Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Annapurna Labs designs silicon and software that accelerates innovation for AWS cloud solutions.
$129,300 - $223,600
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS

Description For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Annapurna Labs, now fully integrated with AWS after its 2015 acquisition, is seeking a Senior Machine Learning Engineer for their Distributed Training team. This role focuses on AWS Neuron, the complete software stack for AWS Trainium and Inferentia cloud-scale ML accelerators. The position involves working with cutting-edge ML technologies, including Large Language Models like GPT and Llama, as well as Stable Diffusion and Vision Transformers. The team operates at the intersection of hardware and software, developing solutions that push the boundaries of what's possible in cloud computing.

The role demands expertise in distributed training libraries such as FSDP, Deepspeed, and Nemo, with a focus on extending these capabilities for Neuron-based systems. You'll collaborate with cross-functional teams, including chip architects and compiler engineers, to optimize performance on AWS custom silicon. The position offers significant growth opportunities within AWS's innovative culture, which values diversity, continuous learning, and work-life harmony.

AWS, as the world's leading cloud platform, provides an environment where you'll work on challenging problems that impact global businesses. The company offers competitive compensation, including base pay ranging from $129,300 to $223,600 depending on location, plus equity and comprehensive benefits. This is an opportunity to join a team that celebrates knowledge-sharing, mentorship, and inclusive culture while working on technology that shapes the future of cloud computing.

Last updated 2 days ago

Responsibilities For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts to build distributed training support into PyTorch and JAX using XLA
  • Optimize models to achieve peak performance on AWS custom silicon
  • Work with chip architects, compiler engineers and runtime engineers
  • Development, enablement and performance tuning of ML model families
  • Create, build and tune distributed training solutions with Trainium instances

Requirements For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture experience
  • Experience programming with at least one software programming language
  • Experience with training large ML models using Python
  • Strong software development skills
  • Solid foundation in Machine Learning

Benefits For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
Equity
  • Medical, financial, and other benefits
  • Equity compensation
  • Sign-on payments
  • Mentorship and career growth opportunities
  • Work-life harmony

Interested in this job?

Jobs Related To Amazon Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Sr. Software Dev Engineer, Amazon Robotics

Senior Software Engineer role at Amazon Robotics focusing on developing foundation models for robotic mobility and manipulation, offering competitive compensation and comprehensive benefits.

Senior Software Development Engineer - GenAI, Amazon Ads - Creative X

Senior Software Engineer role at Amazon focusing on building scalable Generative AI infrastructure and platforms, working with cutting-edge AI technologies and research teams.

Software Development Engineer, Prime Video Search

Senior Software Engineer role at Amazon Prime Video focusing on search engineering and machine learning infrastructure development.

Senior Software Development Engineer, Amazon Ads

Senior Software Engineer role at Amazon Ads focusing on ML/AI-powered advertising solutions and audience targeting capabilities.

Senior Machine Learning Engineer, AGI Foundations

Senior ML Engineer role at Amazon's AGI team, focusing on developing cutting-edge inference solutions for Generative AI models.