Software Engineer- AI/ML, AWS Neuron Distributed Training

Amazon

AWS is a leading cloud infrastructure company, with Annapurna Labs serving as AWS's infrastructure provider.

San Francisco, CA, USA

$129,300 - $223,600

Machine Learning

Mid-Level Software Engineer

In-Person

5,000+ Employees

3+ years of experience

AI · Enterprise SaaS

Description For Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Neuron is seeking a Software Engineer II to join their Machine Learning Applications team, focusing on distributed training solutions. This role is part of Annapurna Labs, acquired by AWS in 2015, which serves as the infrastructure provider for AWS. The position involves working with cutting-edge ML technologies, including AWS Inferentia and Trainium cloud-scale machine learning accelerators.

The role requires expertise in distributed training libraries like FSDP and Deepspeed, and involves close collaboration with chip architects, compiler engineers, and runtime engineers. You'll be responsible for developing and optimizing support for various ML model families, including large language models like GPT2/GPT3, stable diffusion, and Vision Transformers.

AWS offers a strong emphasis on work-life balance, mentorship, and career growth. The company maintains an inclusive culture with ten employee-led affinity groups and innovative benefit offerings. The team values knowledge sharing and supports new members through a broad mix of experience levels and tenures.

This position offers competitive compensation ranging from $129,300 to $223,600 based on geographic location, plus equity and comprehensive benefits. The role presents significant opportunities for working with large-scale systems and contributing to AWS's continued innovation in cloud infrastructure and machine learning acceleration.

Last updated a day ago

Responsibilities For Software Engineer- AI/ML, AWS Neuron Distributed Training

Lead efforts building distributed training support into PyTorch, TensorFlow using XLA
Work with Neuron compiler and runtime stacks
Tune models for highest performance on AWS Trainium and Inferentia silicon
Develop and enable ML model families including GPT2, GPT3, stable diffusion, and Vision Transformers
Work with chip architects, compiler engineers and runtime engineers
Create, build and tune distributed training solutions with Trn1

Requirements For Software Engineer- AI/ML, AWS Neuron Distributed Training

Python

3+ years of non-internship professional software development experience
3+ years of non-internship design or architecture experience
Experience programming with at least one software programming language
Deep Learning industry experience
Experience with PyTorch/JAX/TensorFlow
Knowledge of distributed libraries and frameworks
End-to-end Model Training experience

Benefits For Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance

Medical, financial, and other benefits
Flexible working hours
Mentorship and career growth opportunities
Employee-led affinity groups
Work-life balance focus

Amazon

AWS is a leading cloud infrastructure company, with Annapurna Labs serving as AWS's infrastructure provider.

San Francisco, CA, USA

$129,300 - $223,600

Machine Learning

Mid-Level Software Engineer

In-Person

5,000+ Employees

3+ years of experience

AI · Enterprise SaaS

Interested in this job?

Jobs Related To Amazon Software Engineer- AI/ML, AWS Neuron Distributed Training

Software Development Engineer, StoreGen

Amazon

AI-focused Software Development Engineer role at Amazon, building next-generation development tools and practices using artificial intelligence.

Software Dev Engineer, AGI Info - Web & Knowledge Services

Amazon

Software Development Engineer role at Amazon focusing on AGI development, combining ML, distributed systems, and high-performance computing.

Software Development Engineer II

Amazon

Software Development Engineer II position at Amazon's AI Technology team, focusing on machine learning and AI innovation for consumer electronics and shopping experiences.

Software Development Engineer II

Amazon

Software Development Engineer II position at Amazon focusing on AI/ML systems development and implementation within the Consumer Electronics Technology organization.

Software Engineer - AI/ML, AWS Neuron Distributed Training - Multimodal

Amazon

ML Engineer role at AWS developing distributed training solutions for cloud-scale machine learning accelerators, focusing on LLMs and multi-modal models.