Software Development Engineer II, ML Acceleration

Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform, pioneering cloud computing and continuously innovating.
$129,300 - $223,600
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS · Cloud
This job posting may no longer be active. You may be interested in these related jobs instead:
Machine Learning Engineer, AWS Neuron Apps

Senior ML Engineer role at AWS working on Neuron software stack for machine learning accelerators

Senior Solutions Architect - AI/ML, AWS Cloud Intelligence

Senior Solutions Architect position at AWS Cloud Intelligence team, focusing on AI/ML solutions and customer advisory for Azure to AWS migrations.

Sr. Machine Learning Engineer, AGI Foundations

Senior Machine Learning Engineer position at Amazon's AGI team focusing on developing industry-leading multimodal AI systems and large language models.

Software Development Engineer, Prime Video Sports

Senior Software Engineer role at Amazon Prime Video Sports, focusing on ML/CV technology to enhance sports streaming experiences.

Machine Learning Engineer III, FAR (Frontier AI & Robotics)

Senior ML Engineer role at Amazon Robotics, optimizing large-scale foundation models and working with world-class AI researchers to advance robotics technology.

Description For Software Development Engineer II, ML Acceleration

AWS Neuron is the complete software stack for the AWS Inferentia (Inf1/Inf2) and Trainium (Trn1), our cloud-scale Machine Learning accelerators. This role is for a machine learning engineer in the Inference team for AWS Neuron, responsible for development, enablement and performance tuning of a wide variety of ML model families, including massive-scale Large Language Models (LLM) such as GPT and Llama, as well as Stable Diffusion, Vision Transformers (ViT) and many more.

The ML Inference team works side by side with chip architects, compiler engineers and runtime engineers to create, build and optimize distributed inference solutions with Trainium/Inferentia instances. Experience with training and optimizing inference on these large models using Python/C++ is a must. Model parallelization, quantization, memory optimization - vLLM, DeepSpeed and other distributed inference libraries are central to this role, and extending all of them for the Neuron based system is key.

Key responsibilities include:

  • Leading efforts to build and achieve the best distributed training and inference performance of PyTorch, JAX, TensorFlow with XLA and other advanced frameworks on Neuron stacks.
  • Optimizing models to ensure the highest performance and maximize efficiency on custom AWS Trainium and Inferentia silicon and the Trn1, Inf1/2 servers.
  • Strong software development (Python and C++) and Machine Learning knowledge are critical to this role.

This position offers an opportunity to work on cutting-edge ML accelerator technology and contribute to the development of AWS's cloud-scale machine learning infrastructure.

Last updated 2 months ago

Responsibilities For Software Development Engineer II, ML Acceleration

  • Lead efforts to build and optimize distributed training and inference performance
  • Develop and tune ML models for AWS Inferentia and Trainium accelerators
  • Work with chip architects, compiler engineers, and runtime engineers
  • Optimize large-scale ML models, including LLMs, for performance and efficiency
  • Extend distributed inference libraries for Neuron-based systems

Requirements For Software Development Engineer II, ML Acceleration

Python
  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture experience
  • Experience programming with at least one software programming language
  • 3+ years of full software development life cycle experience
  • Bachelor's degree in computer science or equivalent
  • Strong software development skills in Python and C++
  • Experience with PyTorch, JAX, TensorFlow, and XLA
  • Knowledge of distributed training and inference optimization
  • Familiarity with model parallelization, quantization, and memory optimization techniques
  • Experience with vLLM, DeepSpeed, and other distributed inference libraries

Benefits For Software Development Engineer II, ML Acceleration

Medical Insurance
  • Medical Insurance
  • Financial Benefits
  • Other Benefits

Interested in this job?