Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS is a leading cloud infrastructure company, with Annapurna Labs serving as AWS's infrastructure provider following its 2015 acquisition.
$129,300 - $223,600
Machine Learning
Mid-Level Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS

Description For Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS (Amazon Web Services) is seeking a talented Software Engineer II to join the Annapurna Labs team, specifically working on the Machine Learning Applications (ML Apps) team for AWS Neuron. This role is at the intersection of cloud infrastructure and cutting-edge machine learning technology.

The position focuses on developing and optimizing the AWS Neuron software stack, which powers AWS Inferentia and Trainium cloud-scale machine learning accelerators. You'll be responsible for enabling and performance-tuning various ML model families, including large language models like GPT-2 and GPT-3, stable diffusion, and Vision Transformers.

As a key member of the ML Distributed Training team, you'll collaborate closely with chip architects, compiler engineers, and runtime engineers. Your primary focus will be on building distributed training support into frameworks like PyTorch and TensorFlow, working with XLA and the Neuron compiler and runtime stacks. The role requires both strong software development skills and deep machine learning knowledge.

The team operates within AWS's larger infrastructure ecosystem, where Annapurna Labs (acquired by AWS in 2015) serves as a crucial infrastructure provider. The organization spans multiple disciplines, including silicon engineering, hardware design and verification, software, and operations. Their impressive portfolio includes products like AWS Nitro, ENA, EFA, Graviton and F1 EC2 Instances, AWS Neuron, Inferentia and Trainium ML Accelerators.

AWS offers a supportive and inclusive work environment with a strong emphasis on work-life balance. The company provides comprehensive benefits, mentorship opportunities, and a culture that celebrates knowledge sharing. With ten employee-led affinity groups reaching 40,000 employees globally, AWS is committed to fostering diversity and inclusion.

This role offers an exciting opportunity to work on cutting-edge ML infrastructure that impacts millions of users worldwide. You'll be at the forefront of developing solutions that help businesses leverage machine learning at scale, while working with some of the most advanced cloud and ML technologies available today.

Last updated 2 months ago

Responsibilities For Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts building distributed training support into PyTorch, TensorFlow using XLA
  • Work with Neuron compiler and runtime stacks
  • Tune ML models for highest performance on AWS Trainium and Inferentia silicon
  • Develop and enable various ML model families including GPT2, GPT3, stable diffusion, and Vision Transformers
  • Work with chip architects, compiler engineers and runtime engineers

Requirements For Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
  • 3+ years of non-internship professional software development experience
  • 3+ years of non-internship design or architecture experience
  • Experience programming with at least one software programming language
  • Deep Learning industry experience
  • Experience with PyTorch/JAX/TensorFlow
  • Knowledge of distributed training libraries and frameworks

Benefits For Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
  • Medical, financial, and other benefits
  • Flexible working hours
  • Mentorship and career growth opportunities
  • Inclusive team culture
  • Employee-led affinity groups

Interested in this job?

Jobs Related To Amazon Software Engineer- AI/ML, AWS Neuron Distributed Training

Software Development Engineer (ML), AGI Customization

ML Engineer position at Amazon's AGI team, focusing on LLM training techniques and AI model customization, offering competitive compensation and growth opportunities.

Software Development Engineer (ML), AGI Customization

ML Engineer role at Amazon's AGI team focusing on LLM training, fine-tuning, and distillation, offering competitive salary and growth opportunities.

Software Development Engineer (ML), AGI Customization

ML Engineer role at Amazon's AGI team focusing on LLM customization, fine-tuning, and distillation, offering competitive compensation and growth opportunities.

Software Development Engineer (ML), AGI Customization

ML Engineer position at Amazon's AGI team focusing on LLM customization, fine-tuning, and distillation, offering competitive salary and comprehensive benefits.

Software Dev Engineer, AGI Info - Web & Knowledge Services

Software Development Engineer role at Amazon focusing on AGI development, combining ML, retrieval systems, and high-performance computing.