Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS is the world's most comprehensive and broadly adopted cloud platform, pioneering cloud computing and continuous innovation.
$151,300 - $261,500
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Annapurna Labs, now fully integrated with AWS after its 2015 acquisition, is seeking a Senior Machine Learning Engineer for their Distribute Training team within AWS Neuron. This role focuses on developing and optimizing distributed training solutions for AWS's cloud-scale Machine Learning accelerators, Trainium and Inferentia. The position involves working with cutting-edge ML models including LLMs like GPT and Llama, as well as Stable Diffusion and Vision Transformers.

The role requires expertise in distributed training libraries such as FSDP, Deepspeed, and Nemo, along with strong Python skills. You'll collaborate with cross-functional teams including chip architects and compiler engineers to push the boundaries of ML training performance on AWS custom silicon.

AWS values diverse experiences and maintains an inclusive culture through employee-led affinity groups and ongoing learning experiences. The team emphasizes knowledge-sharing and mentorship, making it an ideal environment for professional growth. Work-life harmony is prioritized, ensuring success both at work and home.

The position offers competitive compensation ranging from $151,300 to $261,500 per year, depending on location and experience, plus additional benefits including equity and sign-on payments. This is an opportunity to work at the forefront of ML infrastructure, developing solutions that enable customers to solve previously unimaginable technical challenges.

As part of Annapurna Labs, you'll be working with the team responsible for critical AWS infrastructure components including AWS Nitro, Graviton, and ML Accelerators. The role combines deep technical expertise with leadership opportunities, making it perfect for experienced engineers passionate about advancing ML technology at scale.

Last updated 5 minutes ago

Responsibilities For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts to build distributed training support into PyTorch and JAX using XLA
  • Optimize models to achieve peak performance on AWS custom silicon
  • Work with chip architects, compiler engineers and runtime engineers
  • Create, build and tune distributed training solutions with Trainium instances
  • Develop and enable performance tuning of ML model families including LLMs

Requirements For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
Java
  • Bachelor's degree in computer science or equivalent
  • 5+ years of non-internship professional software development experience
  • 5+ years of programming with at least one software programming language
  • 5+ years of leading design or architecture experience
  • 5+ years of full software development life cycle experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Experience in machine learning, data mining, statistics or natural language processing

Benefits For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
Equity
  • Medical, financial, and other benefits
  • Equity compensation
  • Sign-on payments
  • Mentorship and career growth opportunities
  • Work-life harmony
  • Inclusive team culture

Interested in this job?

Jobs Related To Amazon Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Sr Software Development Engineer, Vulcan

Senior Software Engineer role at Amazon focusing on ML infrastructure and automation for book industry optimization, offering competitive pay and benefits.

Senior Applied Scientist, NOSO Science

Senior Applied Scientist role at Amazon focusing on supply chain optimization using ML and stochastic optimization, offering $150K-$260K salary in NYC.

AIML - Sr Software Engineer, Siri on the Go

Senior Software Engineer role at Apple focusing on Siri development for mobile devices, combining AI expertise with system optimization for Apple Watch, AirPods, and automotive applications.

Senior Machine Learning Systems Engineer

Senior Machine Learning Systems Engineer role at Apple, building scalable ML infrastructure and tools for data scientists and engineers.

AI Software Engineer - Interlinked

Senior AI Software Engineer role at Apple working on intelligent frameworks and automation tools within the Security Engineering & Architecture organization.