Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform, pioneering cloud computing innovation.
$151,300 - $261,500
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Annapurna Labs, now fully integrated with AWS after its 2015 acquisition, is seeking a Senior Machine Learning Engineer for their Distributed Training team. This role focuses on AWS Neuron, the complete software stack for AWS Trainium and Inferentia cloud-scale Machine Learning accelerators. The position involves working with cutting-edge ML technologies, including Large Language Models like GPT and Llama, as well as Stable Diffusion and Vision Transformers.

The role demands expertise in distributed training libraries such as FSDP, Deepspeed, and Nemo, with a focus on extending these capabilities for Neuron-based systems. You'll collaborate with cross-functional teams, including chip architects and compiler engineers, to optimize performance on AWS custom silicon platforms.

AWS values diverse experiences and maintains an inclusive culture through employee-led affinity groups and ongoing learning experiences. The team emphasizes knowledge-sharing and mentorship, supporting both professional and personal growth. They offer competitive compensation, including equity and comprehensive benefits, reflecting their commitment to being Earth's Best Employer.

The position offers exposure to groundbreaking technology in cloud computing and machine learning, working on products that directly impact AWS's infrastructure. You'll be part of a team that has delivered significant products like AWS Nitro, Graviton, and ML Accelerators, contributing to solutions that help customers tackle previously unimaginable technical challenges.

This role presents an exceptional opportunity for experienced engineers passionate about machine learning and distributed systems to work at the forefront of cloud technology, while enjoying a supportive, inclusive work environment that values work-life harmony and continuous learning.

Last updated 2 months ago

Responsibilities For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

  • Lead efforts to build distributed training support into PyTorch and JAX using XLA
  • Optimize models to achieve peak performance on AWS custom silicon
  • Work with chip architects, compiler engineers and runtime engineers
  • Create, build and tune distributed training solutions with Trainium instances
  • Development, enablement and performance tuning of ML model families

Requirements For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Python
Java
  • Bachelor's degree in computer science or equivalent
  • 5+ years of non-internship professional software development experience
  • 5+ years of programming with at least one software programming language
  • 5+ years of leading design or architecture of new and existing systems
  • 5+ years of full software development life cycle experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Experience in machine learning, data mining, statistics or natural language processing

Benefits For Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Medical Insurance
Equity
Mental Health Assistance
  • Medical, financial, and other benefits
  • Equity compensation
  • Sign-on payments
  • Mentorship and career growth opportunities
  • Work-life harmony

Interested in this job?

Jobs Related To Amazon Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Senior Software Development Engineer, Alexa Identity - Alexa Connected Devices

Senior Software Engineer role at Amazon's Alexa Identity team, focusing on LLM-based conversational AI development with 5+ years experience required.

Software Engineer- AI/ML, AWS Neuron

Senior Software Engineer role at AWS Neuron team, focusing on ML model development and optimization for cloud-scale machine learning accelerators.

Senior Machine Learning Engineer, Generative AI

Senior ML Engineer role at Amazon focusing on generative AI, LLM runtime systems, and inference optimization, requiring 5+ years of experience.

Sr. Machine Learning Engineer, AGIF | Finetuning

Senior Machine Learning Engineer position at Amazon's AGI Finetuning team, focusing on developing and maintaining AI model evaluation systems.

Sr. Thermal & Mechanical Engineer

Senior Thermal & Mechanical Engineer position at AWS, focusing on hardware design and optimization for cloud infrastructure, requiring 10+ years of experience in thermal and mechanical systems.