Software Engineer- AI/ML, AWS Neuron Machine Learning Distributed Training, ML Accuracy

Amazon Web Services (AWS) is a leading cloud computing platform, with Annapurna Labs being their infrastructure provider.
$129,300 - $223,600
Machine Learning
Mid-Level Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS

Description For Software Engineer- AI/ML, AWS Neuron Machine Learning Distributed Training, ML Accuracy

AWS Neuron is seeking a Software Engineer to join their Machine Learning Applications team, focusing on distributed training solutions. This role is part of Annapurna Labs, acquired by AWS in 2015, which serves as the infrastructure provider for AWS. The position involves working with cutting-edge ML technologies, including large language models, stable diffusion, and Vision Transformers.

The role combines software development expertise with machine learning knowledge, requiring collaboration with chip architects and compiler engineers. You'll be responsible for developing and optimizing distributed training support across multiple frameworks like PyTorch, TensorFlow, and JAX, while working with AWS's custom silicon solutions (Trainium and Inferentia).

AWS offers a highly inclusive culture with ten employee-led affinity groups and innovative benefits. The team values work-life balance and provides flexibility in working hours. There's a strong emphasis on mentorship and knowledge sharing, with opportunities for career growth through challenging projects.

The compensation is competitive, ranging from $129,300 to $223,600 based on location and experience, plus additional benefits and equity opportunities. This is an excellent opportunity for someone passionate about machine learning infrastructure and distributed systems who wants to impact millions of users worldwide while working with cutting-edge technology at scale.

Last updated 2 days ago

Responsibilities For Software Engineer- AI/ML, AWS Neuron Machine Learning Distributed Training, ML Accuracy

  • Lead efforts building distributed training support into PyTorch, TensorFlow, JAX
  • Work with chip architects, compiler engineers and runtime engineers
  • Create, build and tune distributed training solutions
  • Tune models for highest performance on AWS Trainium and Inferentia silicon
  • Develop and enable performance tuning of ML model families
  • Work with large language models like LLamas, Deepseeks, GPTs

Requirements For Software Engineer- AI/ML, AWS Neuron Machine Learning Distributed Training, ML Accuracy

Python
Java
TypeScript
  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture experience
  • Experience programming with at least one software programming language
  • Deep Learning industry experience
  • Experience with PyTorch/JAX/TensorFlow
  • Knowledge of distributed libraries and frameworks
  • Experience with end-to-end model training

Benefits For Software Engineer- AI/ML, AWS Neuron Machine Learning Distributed Training, ML Accuracy

Medical Insurance
401k
Mental Health Assistance
  • Medical, financial, and other benefits
  • Flexible working hours
  • Career growth opportunities
  • Mentorship program
  • Work-life balance focus

Interested in this job?

Jobs Related To Amazon Software Engineer- AI/ML, AWS Neuron Machine Learning Distributed Training, ML Accuracy

Software Development Engineer II, ML_AI

AWS SageMaker AI seeks SDE II to build next-gen AI platform, focusing on LLMs and distributed machine learning systems, offering competitive compensation and growth opportunities.

Software Development Engineer, Selling Partner Experience

SDE role at Amazon working on AI-driven Selling Assistant, focusing on LLMs and ML technologies to revolutionize seller experience

Software Engineer- AI/ML, AWS Neuron

AWS Neuron ML Engineer role focusing on distributed training and optimization of large language models using AWS Inferentia and Trainium accelerators.

Software Development Engineer, Finance Technology

Build AI/ML applications for Amazon's finance systems, focusing on data processing, forecasting, and automation within the FinTech team.

Machine Learning Engineer, MLE II, Amazon Q in QuickSight

Machine Learning Engineer role at Amazon working on Q in QuickSight, focusing on LLMs and NLP for business intelligence solutions.