Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS is the world's most comprehensive and broadly adopted cloud platform, pioneering cloud computing and continuous innovation.
$151,300 - $261,500
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS · Cloud

Description For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Annapurna Labs, now fully integrated with AWS after its 2015 acquisition, is seeking a Senior Machine Learning Engineer for their Distributed Training team. This role focuses on AWS Neuron, the complete software stack for AWS Trainium and Inferentia cloud-scale Machine Learning accelerators. The position involves working with cutting-edge ML technologies, including Large Language Models like GPT and Llama, as well as Stable Diffusion and Vision Transformers. The role requires expertise in distributed training libraries such as FSDP, Deepspeed, and Nemo, while working closely with chip architects and compiler engineers. The team values knowledge-sharing, mentorship, and career growth, offering opportunities to work on complex technical challenges that push the boundaries of cloud computing. AWS provides a comprehensive compensation package, including competitive base pay, equity, and extensive benefits, while fostering an inclusive culture that celebrates diversity and work-life harmony. This position represents an opportunity to shape the future of machine learning infrastructure at one of the world's leading cloud platforms.

Last updated 14 hours ago

Responsibilities For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts to build distributed training support into PyTorch and JAX using XLA
  • Optimize models to achieve peak performance on AWS custom silicon
  • Work with chip architects, compiler engineers and runtime engineers
  • Create, build and tune distributed training solutions with Trainium instances
  • Development, enablement and performance tuning of ML model families

Requirements For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
Java
  • Bachelor's degree in computer science or equivalent
  • 5+ years of non-internship professional software development experience
  • 5+ years of programming experience
  • 5+ years of leading design or architecture experience
  • 5+ years of full software development life cycle experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Experience in machine learning, data mining, statistics or natural language processing

Benefits For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
Equity
Mental Health Assistance
  • Medical, financial, and other benefits
  • Equity compensation
  • Sign-on payments
  • Mentorship and career growth opportunities
  • Work-life harmony
  • Inclusive team culture

Interested in this job?

Jobs Related To Amazon Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Sr. Machine Learning Engineer, Routing and Planning

Senior Machine Learning Engineer position at Amazon focusing on AI solutions for last-mile delivery optimization and routing

Sr. Software Engineer- AI/ML, AWS Neuron Apps

Senior ML Engineer role at AWS working on Neuron software stack for cloud-scale machine learning accelerators, focusing on distributed training and model optimization.

Senior Machine Learning Engineer, Generative AI

Senior ML Engineer role at Amazon focusing on LLM runtime systems development, offering competitive compensation and opportunity to work on cutting-edge AI technology.

Senior Machine Learning Engineer, Generative AI

Senior ML Engineer role at Amazon focusing on LLM runtime systems development, offering competitive pay and benefits with opportunities for technical leadership and innovation.

Software Development Engineer, Frontier AI & Robotics

Senior Software Engineer role at Amazon's Frontier AI & Robotics team focusing on ML optimization and robotics systems development.