Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS is the world's most comprehensive and broadly adopted cloud platform, pioneering cloud computing and continuous innovation.
$151,300 - $261,500
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Annapurna Labs, now fully integrated into AWS after its 2015 acquisition, is seeking a Senior Machine Learning Engineer for their Distribute Training team within AWS Neuron. This role focuses on developing and optimizing machine learning solutions for AWS's custom silicon accelerators, Trainium and Inferentia.

The position involves working with cutting-edge ML technologies, including Large Language Models (LLM) like GPT and Llama, as well as Stable Diffusion and Vision Transformers. You'll be at the intersection of hardware and software, collaborating with chip architects and compiler engineers to push the boundaries of distributed training solutions.

As part of AWS, you'll join a team that values knowledge-sharing, mentorship, and career growth. The role offers competitive compensation ($151,300 - $261,500 based on location) and comprehensive benefits. AWS's inclusive culture celebrates diversity through employee-led affinity groups and ongoing learning experiences.

The ideal candidate will bring strong software development skills, deep ML expertise, and experience with frameworks like PyTorch/JAX/TensorFlow. You'll be working on AWS Neuron, the complete software stack for AWS's cloud-scale ML accelerators, making direct impacts on how customers leverage AWS's infrastructure for their ML needs.

This is an opportunity to shape the future of machine learning infrastructure at AWS, working with a team that's dedicated to innovation and technical excellence. The role combines hands-on technical leadership with the chance to mentor others and contribute to AWS's mission of being Earth's Best Employer.

Last updated 6 hours ago

Responsibilities For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts to build distributed training support into PyTorch and JAX using XLA
  • Optimize models to achieve peak performance on AWS custom silicon
  • Work with chip architects, compiler engineers and runtime engineers
  • Create, build and tune distributed training solutions with Trainium instances
  • Development, enablement and performance tuning of ML model families
  • Work with FSDP, Deepspeed, Nemo and other distributed training libraries

Requirements For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
Java
  • Bachelor's degree in computer science or equivalent
  • 5+ years of non-internship professional software development experience
  • 5+ years of programming with at least one software programming language experience
  • 5+ years of leading design or architecture experience
  • 5+ years of full software development life cycle experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Experience in machine learning, data mining, information retrieval, statistics or natural language processing

Benefits For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
Vision Insurance
Dental Insurance
401k
  • Medical Insurance
  • Vision Insurance
  • Dental Insurance
  • 401k

Interested in this job?

Jobs Related To Amazon Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Sr Software Dev Engineer, Deep Learning Compilers

Senior Software Engineering role at Amazon focusing on deep learning compiler development and optimization for Neural Edge processors, offering competitive compensation and opportunity to impact millions of users.

Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Senior Software Engineering role at AWS focusing on machine learning infrastructure, distributed training, and performance optimization for cloud-scale AI accelerators.

Sr. Machine Learning Engineer, Routing and Planning

Senior Machine Learning Engineer role at Amazon focusing on AI solutions for Last Mile delivery optimization and routing planning, handling 11B+ packages globally.

Sr. Physical Design Engineer - Static Timing Analysis, Annapurna Labs, Cloud Scale Machine Learning

Senior Physical Design Engineer role at AWS focusing on static timing analysis and machine learning acceleration, offering competitive compensation and comprehensive benefits.

Senior Software Dev Engineer, Product Quality Tech

Senior Software Engineer role at Amazon focusing on AI/ML systems to protect marketplace integrity and customer trust through product quality verification.