Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS is a leading cloud infrastructure company, with Annapurna Labs serving as AWS's infrastructure provider following its 2015 acquisition.
$129,300 - $223,600
Machine Learning
Mid-Level Software Engineer
In-Person
3+ years of experience
AI · Enterprise SaaS

Description For Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Neuron is seeking a talented Software Engineer to join their Machine Learning Applications (ML Apps) team. This role is part of the innovative Annapurna Labs organization, which was acquired by AWS in 2015 and serves as the infrastructure backbone of AWS.

The position focuses on developing and optimizing AWS Neuron, the complete software stack for AWS Inferentia and Trainium cloud-scale machine learning accelerators. You'll be working with cutting-edge ML technologies, including large language models like GPT2 and GPT3, stable diffusion, and Vision Transformers.

As a Software Engineer II, you'll collaborate with chip architects, compiler engineers, and runtime engineers to create sophisticated distributed training solutions. Your responsibilities will include implementing distributed training support in frameworks like PyTorch and TensorFlow, optimizing model performance on AWS Trainium and Inferentia silicon, and working with various ML model families.

The role offers an exciting opportunity to work at the intersection of hardware and software, directly impacting how businesses leverage machine learning at scale. You'll be part of a team that has delivered groundbreaking products like AWS Nitro, ENA, EFA, Graviton, and F1 EC2 Instances.

AWS provides a supportive and inclusive work environment with a strong focus on work-life balance. The company offers comprehensive benefits, mentorship opportunities, and a culture that celebrates diversity through various employee-led affinity groups. You'll have the chance to grow professionally while working on challenging problems that affect millions of users worldwide.

The compensation is competitive, ranging from $129,300 to $223,600 per year, depending on location and experience, plus additional benefits and potential equity. This is an excellent opportunity for someone with strong software development skills and ML knowledge who wants to make a significant impact in the cloud computing and machine learning space.

Last updated a day ago

Responsibilities For Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts building distributed training support into PyTorch, TensorFlow using XLA
  • Work with Neuron compiler and runtime stacks
  • Tune ML models for highest performance on AWS Trainium and Inferentia silicon
  • Develop and enable various ML model families including GPT2, GPT3, stable diffusion, and Vision Transformers
  • Work with chip architects, compiler engineers and runtime engineers
  • Create, build and tune distributed training solutions with Trn1

Requirements For Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
  • 3+ years of non-internship professional software development experience
  • 3+ years of non-internship design or architecture experience
  • Experience programming with at least one software programming language
  • Deep Learning industry experience
  • Experience with PyTorch/JAX/TensorFlow
  • Knowledge of distributed training libraries and frameworks
  • Bachelor's degree in computer science or equivalent (preferred)

Benefits For Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
401k
  • Medical, financial, and other benefits
  • Flexible working hours
  • Mentorship and career growth opportunities
  • Employee-led affinity groups
  • Work-life balance focus

Interested in this job?

Jobs Related To Amazon Software Engineer- AI/ML, AWS Neuron Distributed Training

Support Engineer - Intelligent Document Processing

Support Engineer role at Amazon focusing on AI and compliance, implementing LLMs for document validation with 2+ years experience required.

Data Scientist II, Enterprise Engineering

Data Scientist role focused on developing and implementing machine learning models for Amazon's Enterprise Engineering team.

Machine Learning Engineer, Computer Vision & Remote Sensing, Proserve

AWS seeks Computer Vision Engineer for federal services team to develop ML solutions using satellite imagery, medical imaging, and remote sensing capabilities.

Language Engineer II, Amazon Transcribe

Language Engineer II position at Amazon AWS focusing on natural language data collections and GenAI services development.

Support Engineer - Intelligent Document Processing

Support Engineer role at Amazon focusing on AI and compliance, implementing LLMs for document validation with 2+ years experience required.