Software Engineer- AI/ML, AWS Neuron Distributed Training

Amazon Web Services (AWS) is the world's leading cloud platform, with Annapurna Labs being their infrastructure provider.
$129,300 - $223,600
Machine Learning
Mid-Level Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS · Cloud

Description For Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Neuron is seeking a Software Development Engineer II to join their Machine Learning Applications team, focusing on distributed training solutions. This role is part of Annapurna Labs, AWS's infrastructure provider, which was acquired in 2015. The position involves working on AWS Neuron, the complete software stack for AWS Inferentia and Trainium cloud-scale machine learning accelerators.

The role combines deep technical expertise in machine learning with software engineering, requiring work on massive scale language models like GPT2/3 and vision transformers. You'll collaborate with chip architects and compiler engineers to optimize performance on AWS's custom silicon. The team emphasizes work-life balance and provides strong mentorship opportunities.

This is an exciting opportunity to work at the intersection of machine learning and distributed systems, developing solutions that will impact millions of users worldwide. The position offers competitive compensation ($129,300-$223,600 based on location) and comprehensive benefits. You'll be part of AWS's inclusive culture, with access to employee-led affinity groups and ongoing learning experiences.

The role requires 3+ years of software development experience, strong ML knowledge, and expertise in distributed training frameworks. You'll be responsible for building and optimizing ML solutions using PyTorch, TensorFlow, and AWS's custom hardware accelerators. This is a chance to work on cutting-edge ML infrastructure while being part of a team that values knowledge sharing and professional growth.

Last updated 6 days ago

Responsibilities For Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts building distributed training support into PyTorch, TensorFlow using XLA
  • Work with Neuron compiler and runtime stacks
  • Tune ML models for highest performance
  • Maximize efficiency of models running on AWS Trainium and Inferentia silicon
  • Develop and enable various ML model families including GPT2, GPT3, stable diffusion, and Vision Transformers
  • Work with chip architects, compiler engineers and runtime engineers
  • Create, build and tune distributed training solutions with Trn1

Requirements For Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
Java
  • 3+ years of non-internship professional software development experience
  • 3+ years of non-internship design or architecture experience
  • Experience programming with at least one software programming language
  • Deep Learning industry experience
  • Bachelor's degree in computer science or equivalent (preferred)
  • Experience with PyTorch/JAX/TensorFlow (preferred)
  • Experience with distributed libraries and frameworks (preferred)
  • End-to-end Model Training experience (preferred)

Benefits For Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
401k
  • Medical benefits
  • Financial benefits
  • Work-life balance
  • Mentorship opportunities
  • Career growth opportunities
  • Employee-led affinity groups
  • Inclusive work culture

Interested in this job?

Jobs Related To Amazon Software Engineer- AI/ML, AWS Neuron Distributed Training

Software Development Engineer

Software Development Engineer role at Amazon's AGI team, focusing on building advanced AI systems for customer understanding and personalization across Amazon products.

Software Development Engineer II, Amazon

Amazon SDE II role focusing on AWS and ML technologies to build customer-centric solutions for Private Brands, offering competitive compensation and growth opportunities.

Systems Engineer, AI/ML

Systems Engineer position at AWS focusing on AI/ML services, combining cloud infrastructure expertise with artificial intelligence systems support.

Software Engineer- AI/ML, AWS Neuron

Software Engineer position for AWS Neuron team working on AI/ML infrastructure and distributed training solutions.

Software Engineer- AI/ML, AWS Neuron Distributed Training

Senior Software Engineer position at AWS Neuron focusing on distributed training solutions for machine learning, working with cutting-edge ML accelerators and frameworks.