Software Engineer- AI/ML, AWS Neuron

Amazon Web Services (AWS) is a leading cloud computing platform providing scalable and reliable cloud services.
$129,300 - $223,600
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS

Description For Software Engineer- AI/ML, AWS Neuron

AWS Neuron is seeking a talented Software Engineer to join their Machine Learning Applications (ML Apps) team. This role focuses on developing and optimizing AWS Inferentia and Trainium cloud-scale machine learning accelerators.

The position involves working with cutting-edge ML technologies and large language models like GPT-2, GPT-3, stable diffusion, and Vision Transformers. You'll collaborate closely with chip architects, compiler engineers, and runtime engineers to create distributed training solutions using Trn1.

Key responsibilities include:

  • Leading distributed training and inference support development for PyTorch, TensorFlow, and JAX
  • Performance tuning and optimization of ML models for AWS Trainium and Inferentia silicon
  • Working with FSDP, Deepspeed, and other distributed training libraries

AWS offers an inclusive culture with 10 employee-led affinity groups across 190 global chapters. The team values work-life balance and provides flexibility in working hours. You'll have opportunities for mentorship and career growth in a supportive environment that celebrates knowledge sharing.

The compensation package includes:

  • Base salary range: $129,300 - $223,600 per year (varies by location)
  • Comprehensive medical, financial, and other benefits
  • Potential equity and sign-on payments
  • Employee benefits through Amazon's total compensation package

Join a team that embraces diversity, follows Amazon's 16 Leadership Principles, and is dedicated to building innovative ML solutions at scale.

Last updated a day ago

Responsibilities For Software Engineer- AI/ML, AWS Neuron

  • Develop distributed training and inference support for PyTorch, TensorFlow, and JAX
  • Performance tune and optimize ML models for AWS Trainium and Inferentia
  • Work with large language models like GPT-2, GPT-3, and stable diffusion
  • Collaborate with chip architects and compiler engineers
  • Create and build distributed training solutions with Trn1

Requirements For Software Engineer- AI/ML, AWS Neuron

Python
Kubernetes
  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture experience
  • Experience programming with at least one software programming language
  • Experience with Python and ML model training
  • Knowledge of distributed training libraries (FSDP, Deepspeed)

Benefits For Software Engineer- AI/ML, AWS Neuron

Medical Insurance
Equity
Mental Health Assistance
  • Medical benefits
  • Financial benefits
  • Equity opportunities
  • Sign-on payments
  • Flexible working hours
  • Mentorship programs
  • Career growth opportunities

Interested in this job?

Jobs Related To Amazon Software Engineer- AI/ML, AWS Neuron

Senior Software Development Engineer, Alexa Identity - Alexa Connected Devices

Senior Software Engineer role at Amazon's Alexa Identity team, focusing on LLM-based conversational AI development with 5+ years experience required.

Senior Machine Learning Engineer, Generative AI

Senior ML Engineer role at Amazon focusing on generative AI, LLM runtime systems, and inference optimization, requiring 5+ years of experience.

Sr. Machine Learning Engineer, AGIF | Finetuning

Senior Machine Learning Engineer position at Amazon's AGI Finetuning team, focusing on developing and maintaining AI model evaluation systems.

Sr. Thermal & Mechanical Engineer

Senior Thermal & Mechanical Engineer position at AWS, focusing on hardware design and optimization for cloud infrastructure, requiring 10+ years of experience in thermal and mechanical systems.

Sr. Deep Learning Compiler Engineer III, AWS Neuron, Annapurna Labs

Senior Deep Learning Compiler Engineer role at AWS Neuron team, focusing on developing and scaling ML compiler technology for AWS Inferentia and Trainium custom chips.