Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS is a leading cloud infrastructure company, with Annapurna Labs serving as AWS's infrastructure provider.
$129,300 - $223,600
Machine Learning
Mid-Level Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS

Description For Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Neuron is seeking a Software Engineer to join their Machine Learning Applications team, focusing on distributed training solutions. This role is part of Annapurna Labs, acquired by AWS in 2015, which serves as the infrastructure provider for AWS. The position involves working with cutting-edge ML technologies, including large language models like Llamas, Deepseeks, and GPTs.

The role combines software development expertise with machine learning knowledge, requiring work with distributed training libraries like FSDP and Deepspeed. You'll collaborate with chip architects and compiler engineers to optimize performance on AWS Trainium and Inferentia platforms.

AWS offers a strong culture of inclusion with ten employee-led affinity groups across 190 global chapters. The team values work-life balance and provides flexibility in working hours. There's a strong emphasis on mentorship and knowledge sharing, with opportunities for career growth through challenging projects.

The position offers competitive compensation ranging from $129,300 to $223,600 based on location, plus equity and comprehensive benefits. You'll be part of a team delivering products that impact millions, including AWS Nitro, ENA, EFA, Graviton, and ML Accelerators.

This is an excellent opportunity for someone passionate about machine learning infrastructure who wants to work at the intersection of hardware and software, developing solutions that power the next generation of AI applications at scale.

Last updated 11 minutes ago

Responsibilities For Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Development, enablement and performance tuning of ML model families
  • Build distributed training support into PyTorch, TensorFlow, JAX
  • Work with chip architects, compiler engineers and runtime engineers
  • Create, build and tune distributed training solutions
  • Ensure highest performance and maximize efficiency of models running on AWS Trainium and Inferentia silicon

Requirements For Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
  • 3+ years of non-internship professional software development experience
  • 3+ years of non-internship design or architecture experience
  • Experience programming with at least one software programming language
  • Deep Learning industry experience
  • Experience with PyTorch/JAX/TensorFlow
  • Bachelor's degree in computer science or equivalent (preferred)

Benefits For Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
  • Medical, financial, and other benefits
  • Flexible working hours
  • Mentorship and career growth opportunities
  • Inclusive team culture
  • Employee-led affinity groups

Interested in this job?

Jobs Related To Amazon Software Engineer- AI/ML, AWS Neuron Distributed Training

Cloud Support Engineer - AI/Machine Learning

Cloud Support Engineer position at AWS focusing on AI/ML, offering technical problem-solving, customer support, and continuous learning opportunities in cloud technologies.

Software Dev Engineer II, Item Inference Solutions

Build ML-powered software systems to maintain product identity integrity in Amazon's vast catalog, ensuring accurate product information for millions of customers.

Software & Machine Learning Engineer, Alexa Analytics

Machine Learning Engineer position at Amazon focusing on developing and optimizing large-scale AI models and inference systems for Alexa Analytics.

Machine Learning Engineer, AGIF | Finetuning

Machine Learning Engineer position at Amazon's AGI Finetuning team, focusing on developing and maintaining model evaluation systems and tools.

Software Development Engineer, Amazon Connect, AWS

AWS Software Development Engineer position building AI features for Amazon Connect, focusing on contact center innovation and cloud technologies.