Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS is a leading cloud infrastructure company, with Annapurna Labs serving as AWS's infrastructure provider following its 2015 acquisition.
$129,300 - $223,600
Machine Learning
Mid-Level Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS

Description For Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS (Amazon Web Services) is seeking a talented Software Engineer II to join the Annapurna Labs team, specifically working on the Machine Learning Applications (ML Apps) team for AWS Neuron. This role is at the intersection of cloud infrastructure and cutting-edge machine learning technology.

The position focuses on developing and optimizing the AWS Neuron software stack, which powers AWS Inferentia and Trainium cloud-scale machine learning accelerators. You'll be responsible for enabling and performance-tuning various ML model families, including large language models like GPT-2 and GPT-3, stable diffusion, and Vision Transformers.

As a key member of the ML Distributed Training team, you'll collaborate closely with chip architects, compiler engineers, and runtime engineers. Your primary focus will be on building distributed training support into frameworks like PyTorch and TensorFlow, working with XLA and the Neuron compiler and runtime stacks. The role requires both strong software development skills and deep machine learning knowledge.

The team operates within AWS's larger infrastructure ecosystem, where Annapurna Labs (acquired by AWS in 2015) serves as a crucial infrastructure provider. The organization spans multiple disciplines, including silicon engineering, hardware design and verification, software, and operations. Their impressive portfolio includes products like AWS Nitro, ENA, EFA, Graviton and F1 EC2 Instances, AWS Neuron, Inferentia and Trainium ML Accelerators.

AWS offers a supportive and inclusive work environment with a strong emphasis on work-life balance. The company provides comprehensive benefits, mentorship opportunities, and a culture that celebrates knowledge sharing. With ten employee-led affinity groups reaching 40,000 employees globally, AWS is committed to fostering diversity and inclusion.

This role offers an exciting opportunity to work on cutting-edge ML infrastructure that impacts millions of users worldwide. You'll be at the forefront of developing solutions that help businesses leverage machine learning at scale, while working with some of the most advanced cloud and ML technologies available today.

Last updated 3 months ago

Responsibilities For Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts building distributed training support into PyTorch, TensorFlow using XLA
  • Work with Neuron compiler and runtime stacks
  • Tune ML models for highest performance on AWS Trainium and Inferentia silicon
  • Develop and enable various ML model families including GPT2, GPT3, stable diffusion, and Vision Transformers
  • Work with chip architects, compiler engineers and runtime engineers

Requirements For Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
  • 3+ years of non-internship professional software development experience
  • 3+ years of non-internship design or architecture experience
  • Experience programming with at least one software programming language
  • Deep Learning industry experience
  • Experience with PyTorch/JAX/TensorFlow
  • Knowledge of distributed training libraries and frameworks

Benefits For Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
  • Medical, financial, and other benefits
  • Flexible working hours
  • Mentorship and career growth opportunities
  • Inclusive team culture
  • Employee-led affinity groups

Interested in this job?

Jobs Related To Amazon Software Engineer- AI/ML, AWS Neuron Distributed Training

ML Compiler Engineer, AWS Neuron, Annapurna Labs

ML Compiler Engineer position at AWS Neuron team, focusing on developing compiler technology for deep-learning workloads and contributing to cutting-edge ML infrastructure.

Software Development Engineer, Promotion Analytics and Optimization Services

Software Development Engineer role at Amazon focusing on machine learning and data analytics for promotion optimization, offering competitive salary and benefits in Vancouver.

Software Development Engineer, Predictive Targeting

Software Development Engineer role at Amazon focusing on machine learning and predictive analytics for customer targeting systems.

Software Development Engineer II, ML_AI

AWS SageMaker AI seeks SDE II to build next-gen AI platform, focusing on LLMs and distributed machine learning systems, offering competitive compensation and growth opportunities.

Software Development Engineer, Selling Partner Experience

SDE role at Amazon working on AI-driven Selling Assistant, focusing on LLMs and ML technologies to revolutionize seller experience