Software Engineer - AI/ML, AWS Neuron Distributed Training - Multimodal

Amazon

World's most comprehensive and broadly adopted cloud platform, pioneering cloud computing and continuous innovation.

Seattle, WA, USA • San Francisco, CA, USA

$129,300 - $223,600

Machine Learning

Mid-Level Software Engineer

In-Person

5,000+ Employees

2+ years of experience

AI · Enterprise SaaS · Cloud

Description For Software Engineer - AI/ML, AWS Neuron Distributed Training - Multimodal

AWS Utility Computing (UC) is at the forefront of cloud innovation, specifically within the Annapurna Labs division. This role focuses on the AWS Neuron software stack for AWS Inferentia and Trainium, our cloud-scale Machine Learning accelerators. As a machine learning engineer in the Distribute Training team, you'll be responsible for developing and optimizing various ML models, including Large Language Models like GPT and Llama, as well as Stable Diffusion and Vision Transformers.

The position requires expertise in distributed training libraries such as FSDP and Deepspeed, working directly with custom AWS silicon. You'll collaborate with a diverse team of chip architects and engineers to build and enhance distributed training solutions. The role combines deep machine learning knowledge with strong software development skills.

AWS values diverse experiences and maintains an inclusive culture that celebrates knowledge-sharing and mentorship. The team supports professional growth through one-on-one mentoring and constructive code reviews. AWS pioneered cloud computing and continues to innovate, serving customers from startups to Global 500 companies.

The company emphasizes work-life harmony and provides comprehensive benefits including medical, financial, and equity compensation. Employee-led affinity groups and ongoing learning experiences, including Conversations on Race and Ethnicity (CORE) and AmazeCon conferences, foster an inclusive environment where differences are celebrated.

This is an opportunity to work on cutting-edge ML infrastructure at scale, with competitive compensation ranging from $129,300 to $223,600 based on location and experience. Join a team that's pushing the boundaries of what's possible in cloud computing and machine learning acceleration.

Last updated 2 days ago

Responsibilities For Software Engineer - AI/ML, AWS Neuron Distributed Training - Multimodal

Lead efforts building distributed training support into PyTorch and TensorFlow
Tune ML models for highest performance on AWS Trainium and Inferentia silicon
Work with chip architects, compiler engineers and runtime engineers
Develop and enable performance tuning of ML model families including LLMs
Create and optimize distributed training solutions with Trainium instances

Requirements For Software Engineer - AI/ML, AWS Neuron Distributed Training - Multimodal

Python

Java

TypeScript

Bachelor's degree in computer science or equivalent
2+ years of computer science fundamentals experience
2+ years of systems architecture and design experience
Experience programming with at least one software programming language
Experience in machine learning, data mining, statistics or natural language processing

Benefits For Software Engineer - AI/ML, AWS Neuron Distributed Training - Multimodal

Medical Insurance

Equity

Mental Health Assistance

Medical, financial, and other benefits
Equity compensation
Sign-on payments
Mentorship and career growth opportunities
Work-life harmony
Inclusive team culture

Amazon

World's most comprehensive and broadly adopted cloud platform, pioneering cloud computing and continuous innovation.

Seattle, WA, USA • San Francisco, CA, USA

$129,300 - $223,600

Machine Learning

Mid-Level Software Engineer

In-Person

5,000+ Employees

2+ years of experience

AI · Enterprise SaaS · Cloud

Interested in this job?

Jobs Related To Amazon Software Engineer - AI/ML, AWS Neuron Distributed Training - Multimodal

Software Development Engineer, StoreGen

Amazon

AI-focused Software Development Engineer role at Amazon, building next-generation development tools and practices using artificial intelligence.

Software Engineer- AI/ML, AWS Neuron Distributed Training

Amazon

AWS Neuron seeks ML Engineer to develop distributed training solutions for cloud-scale machine learning accelerators, focusing on large language models and ML infrastructure.

Software Dev Engineer, AGI Info - Web & Knowledge Services

Amazon

Software Development Engineer role at Amazon focusing on AGI development, combining ML, distributed systems, and high-performance computing.

Software Development Engineer II

Amazon

Software Development Engineer II position at Amazon's AI Technology team, focusing on machine learning and AI innovation for consumer electronics and shopping experiences.

Software Development Engineer II

Amazon

Software Development Engineer II position at Amazon focusing on AI/ML systems development and implementation within the Consumer Electronics Technology organization.