Software Engineer- AI/ML, AWS Neuron Distributed Training

Amazon

AWS (Amazon Web Services) is a leading cloud infrastructure company that provides services to millions of customers worldwide.

San Francisco, CA, USA

$129,300 - $223,600

Machine Learning

Mid-Level Software Engineer

In-Person

5,000+ Employees

3+ years of experience

AI · Enterprise SaaS · Cloud

Description For Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Neuron is seeking a Software Engineer to join their Machine Learning Applications team, focusing on distributed training solutions. This role is part of Annapurna Labs, acquired by AWS in 2015, which serves as the infrastructure provider for AWS. The position involves working on AWS Neuron, the complete software stack for AWS Inferentia and Trainium cloud-scale machine learning accelerators.

The role requires expertise in developing and optimizing distributed training support for major ML frameworks like PyTorch and TensorFlow. You'll work closely with chip architects and compiler engineers to create efficient solutions for Trn1 systems. The position involves performance tuning of various ML models, including large language models like GPT2/GPT3 and stable diffusion.

AWS offers a collaborative environment with strong emphasis on work-life balance and professional growth. The team values knowledge sharing and mentorship, providing opportunities to work on complex projects that impact millions of users. The company provides comprehensive benefits and promotes an inclusive culture through various employee-led affinity groups.

This is an excellent opportunity for engineers passionate about machine learning infrastructure who want to work at the intersection of hardware and software optimization. You'll be part of a team that's pushing the boundaries of ML acceleration and distributed computing, while enjoying the stability and resources of one of the world's leading tech companies.

Last updated a day ago

Responsibilities For Software Engineer- AI/ML, AWS Neuron Distributed Training

Lead efforts building distributed training support into PyTorch and TensorFlow
Work with chip architects, compiler engineers and runtime engineers
Create, build and tune distributed training solutions with Trn1
Performance tuning of ML model families including GPT2, GPT3, and stable diffusion
Ensure highest performance and maximize efficiency on AWS Trainium and Inferentia silicon

Requirements For Software Engineer- AI/ML, AWS Neuron Distributed Training

Python

TypeScript

3+ years of non-internship professional software development experience
3+ years of non-internship design or architecture experience
Experience programming with at least one software programming language
Deep Learning industry experience
Experience with PyTorch/JAX/TensorFlow
Knowledge of distributed libraries and frameworks
Experience with end-to-end model training

Benefits For Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance

Dental Insurance

Vision Insurance

401k

Medical, financial, and other benefits
Flexible working hours
Mentorship and career growth opportunities
Employee-led affinity groups
Work-life balance focus

Amazon

AWS (Amazon Web Services) is a leading cloud infrastructure company that provides services to millions of customers worldwide.

San Francisco, CA, USA

$129,300 - $223,600

Machine Learning

Mid-Level Software Engineer

In-Person

5,000+ Employees

3+ years of experience

AI · Enterprise SaaS · Cloud

Interested in this job?

Jobs Related To Amazon Software Engineer- AI/ML, AWS Neuron Distributed Training

Software Development Engineer, Alexa Identity - Alexa Connected Devices

Amazon

Software Development Engineer role at Amazon's Alexa Identity team, focusing on LLM-based AI assistant development with competitive compensation and benefits.

Software Development Engineer, Generation

Amazon

Software Development Engineer role at Amazon focusing on speech and language AI technology, requiring 3+ years of experience and expertise in Java and AWS services.

ML Software Engineer, Robotics AI

Amazon

ML Software Engineer position at Amazon Robotics focusing on building high-performance robotic systems with AI and computer vision capabilities.

Software Development Engineer - Machine Learning, Ad Response Prediction

Amazon

Machine Learning Software Engineer role at Amazon focusing on ad response prediction and sponsored products systems.

Software Development Engineer

Amazon

Build machine learning systems to monitor and classify billions of products on Amazon's platform, ensuring marketplace safety and compliance.