Software Engineer- AI/ML, AWS Neuron Distributed Training

Amazon

AWS infrastructure provider specializing in silicon engineering, hardware design, software, and operations.

Cupertino, CA, USA

$129,300 - $223,600

Machine Learning

Mid-Level Software Engineer

In-Person

5,000+ Employees

3+ years of experience

AI · Enterprise SaaS

Description For Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Neuron is seeking a Software Development Engineer II to join their Machine Learning Applications team, focusing on distributed training solutions. This role is part of Annapurna Labs, acquired by AWS in 2015, which serves as the infrastructure provider for AWS. The position involves working on AWS Neuron, the complete software stack for AWS Inferentia and Trainium cloud-scale machine learning accelerators.

The role focuses on developing and optimizing distributed training support for large-scale ML models, including GPT-2, GPT-3, stable diffusion, and Vision Transformers. You'll work closely with chip architects, compiler engineers, and runtime engineers to create and tune distributed training solutions for Trn1 systems. Experience with Python and distributed training libraries like FSDP and Deepspeed is essential.

The team emphasizes work-life balance and inclusive culture, with strong support for new members through mentorship and knowledge sharing. AWS offers comprehensive benefits and opportunities for career growth. The position involves working with cutting-edge ML infrastructure and contributing to systems that impact millions of users worldwide.

Key responsibilities include implementing distributed training support in major frameworks, optimizing performance for AWS silicon, and collaborating across teams to deliver high-performance ML solutions. The role requires both strong software development skills and deep ML knowledge, making it ideal for candidates with experience in both areas.

The position offers competitive compensation based on location and experience, along with equity opportunities and comprehensive benefits. AWS maintains a strong commitment to diversity and inclusion, reflected in their leadership principles and workplace culture.

Last updated 3 months ago

Responsibilities For Software Engineer- AI/ML, AWS Neuron Distributed Training

Lead efforts building distributed training support into PyTorch and TensorFlow using XLA
Develop and maintain Neuron compiler and runtime stacks
Tune ML models for optimal performance on AWS Trainium and Inferentia silicon
Work with chip architects and compiler engineers on distributed training solutions
Enable and performance tune various ML model families including GPT2, GPT3, and stable diffusion

Requirements For Software Engineer- AI/ML, AWS Neuron Distributed Training

Python

3+ years of non-internship professional software development experience
3+ years of system design and architecture experience
Experience programming with at least one software programming language
Deep Learning industry experience
Experience with full software development life cycle
Bachelor's degree in computer science or equivalent (preferred)

Benefits For Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance

Mental Health Assistance

Work-life balance
Flexible working hours
Mentorship opportunities
Career growth opportunities
Medical benefits
Employee-led affinity groups

Amazon

AWS infrastructure provider specializing in silicon engineering, hardware design, software, and operations.

Cupertino, CA, USA

$129,300 - $223,600

Machine Learning

Mid-Level Software Engineer

In-Person

5,000+ Employees

3+ years of experience

AI · Enterprise SaaS

Interested in this job?

Software Engineer- AI/ML, AWS Neuron Distributed Training

Amazon

Description For Software Engineer- AI/ML, AWS Neuron Distributed Training

Responsibilities For Software Engineer- AI/ML, AWS Neuron Distributed Training

Requirements For Software Engineer- AI/ML, AWS Neuron Distributed Training

Benefits For Software Engineer- AI/ML, AWS Neuron Distributed Training

Amazon

Jobs Related To Amazon Software Engineer- AI/ML, AWS Neuron Distributed Training