Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS infrastructure provider specializing in silicon engineering, hardware design, software, and operations.
$129,300 - $223,600
Machine Learning
Mid-Level Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS

Description For Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Neuron is seeking a Software Development Engineer II to join their Machine Learning Applications team, focusing on distributed training solutions. This role is part of Annapurna Labs, acquired by AWS in 2015, which serves as the infrastructure provider for AWS. The position involves working on AWS Neuron, the complete software stack for AWS Inferentia and Trainium cloud-scale machine learning accelerators.

The role focuses on developing and optimizing distributed training support for large-scale ML models, including GPT-2, GPT-3, stable diffusion, and Vision Transformers. You'll work closely with chip architects, compiler engineers, and runtime engineers to create and tune distributed training solutions for Trn1 systems. Experience with Python and distributed training libraries like FSDP and Deepspeed is essential.

The team emphasizes work-life balance and inclusive culture, with strong support for new members through mentorship and knowledge sharing. AWS offers comprehensive benefits and opportunities for career growth. The position involves working with cutting-edge ML infrastructure and contributing to systems that impact millions of users worldwide.

Key responsibilities include implementing distributed training support in major frameworks, optimizing performance for AWS silicon, and collaborating across teams to deliver high-performance ML solutions. The role requires both strong software development skills and deep ML knowledge, making it ideal for candidates with experience in both areas.

The position offers competitive compensation based on location and experience, along with equity opportunities and comprehensive benefits. AWS maintains a strong commitment to diversity and inclusion, reflected in their leadership principles and workplace culture.

Last updated 12 days ago

Responsibilities For Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts building distributed training support into PyTorch and TensorFlow using XLA
  • Develop and maintain Neuron compiler and runtime stacks
  • Tune ML models for optimal performance on AWS Trainium and Inferentia silicon
  • Work with chip architects and compiler engineers on distributed training solutions
  • Enable and performance tune various ML model families including GPT2, GPT3, and stable diffusion

Requirements For Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
  • 3+ years of non-internship professional software development experience
  • 3+ years of system design and architecture experience
  • Experience programming with at least one software programming language
  • Deep Learning industry experience
  • Experience with full software development life cycle
  • Bachelor's degree in computer science or equivalent (preferred)

Benefits For Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
Mental Health Assistance
  • Work-life balance
  • Flexible working hours
  • Mentorship opportunities
  • Career growth opportunities
  • Medical benefits
  • Employee-led affinity groups

Interested in this job?

Jobs Related To Amazon Software Engineer- AI/ML, AWS Neuron Distributed Training

Machine Learning Engineer, CreativeX

Machine Learning Engineer role at Amazon's CreativeX team, focusing on real-time ad personalization using advanced AI technologies with competitive compensation and benefits.

Software Dev Engineer II, AWS Healthcare AI

AWS Healthcare AI is seeking a Software Development Engineer II to build and maintain AI-powered healthcare services, offering competitive compensation and comprehensive benefits.

Software Dev Engineer II, AWS Healthcare AI

AWS Healthcare AI seeks Software Dev Engineer II to build and enhance AI-powered healthcare services, focusing on improving patient outcomes through cloud computing and artificial intelligence.

Software Dev Engineer II, AWS Healthcare AI

AWS Healthcare AI seeks Software Dev Engineer II to build and enhance AI-powered healthcare services, offering competitive pay and the chance to improve global healthcare outcomes.

Software Development Engineer II - DSO, (Level 5)

Software Development Engineer II position at Amazon's DSO team working on ML platforms and services for device demand forecasting.