Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS is the world's most comprehensive and broadly adopted cloud platform, pioneering cloud computing and continuous innovation.
$151,300 - $261,500
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Annapurna Labs, acquired by AWS in 2015, is at the forefront of silicon and software innovation for cloud computing. This senior role focuses on the AWS Neuron team, specifically working on distributed training for machine learning applications. The position involves developing and optimizing solutions for AWS Trainium and Inferentia, our cloud-scale ML accelerators.

As a Senior Machine Learning Engineer in the Distribute Training team, you'll be responsible for implementing and fine-tuning various ML model families, including Large Language Models like GPT and Llama, as well as Stable Diffusion and Vision Transformers. You'll work closely with chip architects and software engineers to create cutting-edge distributed training solutions.

The role demands expertise in Python and distributed training libraries such as FSDP, Deepspeed, and Nemo. You'll be joining a team that values knowledge-sharing, mentorship, and diverse experiences. AWS provides a supportive environment focused on work-life harmony and career growth.

The position offers competitive compensation ranging from $151,300 to $261,500 based on location and experience, plus additional benefits including equity and comprehensive medical coverage. You'll be part of an inclusive culture that celebrates diversity and supports ongoing learning through employee-led affinity groups and various learning experiences.

This is an opportunity to work on unprecedented technical challenges, developing solutions that help customers innovate and change the world. The role combines deep technical expertise in machine learning with the chance to work on custom hardware acceleration, making it ideal for those passionate about both ML and high-performance computing.

Last updated 22 days ago

Responsibilities For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts to build distributed training support into PyTorch and JAX using XLA
  • Optimize models for peak performance on AWS custom silicon
  • Work with chip architects, compiler engineers and runtime engineers
  • Develop and tune distributed training solutions with Trainium instances
  • Create and implement distributed training solutions for large-scale ML models
  • Work on massive-scale Large Language Models (LLM) implementation

Requirements For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
Java
  • Bachelor's degree in computer science or equivalent
  • 5+ years of non-internship professional software development experience
  • 5+ years of programming experience
  • 5+ years of leading design or architecture experience
  • 5+ years of full software development life cycle experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Experience in machine learning, data mining, statistics or natural language processing

Benefits For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
Vision Insurance
Dental Insurance
401k
  • Full range of medical benefits
  • Financial benefits
  • Work-life harmony
  • Career growth opportunities
  • Mentorship programs

Interested in this job?

Jobs Related To Amazon Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Senior Software Development Engineer, Alexa Identity - Alexa Connected Devices

Senior Software Engineer role at Amazon's Alexa Identity team, focusing on LLM-based conversational AI development with 5+ years experience required.

Software Engineer- AI/ML, AWS Neuron

Senior Software Engineer role at AWS Neuron team, focusing on ML model development and optimization for cloud-scale machine learning accelerators.

Senior Machine Learning Engineer, Generative AI

Senior ML Engineer role at Amazon focusing on generative AI, LLM runtime systems, and inference optimization, requiring 5+ years of experience.

Sr. Machine Learning Engineer, AGIF | Finetuning

Senior Machine Learning Engineer position at Amazon's AGI Finetuning team, focusing on developing and maintaining AI model evaluation systems.

Sr. Thermal & Mechanical Engineer

Senior Thermal & Mechanical Engineer position at AWS, focusing on hardware design and optimization for cloud infrastructure, requiring 10+ years of experience in thermal and mechanical systems.