Software Development Engineer - AI/ML, AWS Neuron Apps

Amazon is a global technology company known for e-commerce, cloud computing, and artificial intelligence.
$129,300 - $223,600
Machine Learning
Senior Software Engineer
Hybrid
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS
This job posting may no longer be active. You may be interested in these related jobs instead:
Sr. Machine Learning Engineer, AGI Foundations

Senior Machine Learning Engineer position at Amazon's AGI team focusing on developing industry-leading multimodal AI systems and large language models.

Software Development Engineer, Prime Video Sports

Senior Software Engineer role at Amazon Prime Video Sports, focusing on ML/CV technology to enhance sports streaming experiences.

Machine Learning Engineer III, FAR (Frontier AI & Robotics)

Senior ML Engineer role at Amazon Robotics, optimizing large-scale foundation models and working with world-class AI researchers to advance robotics technology.

ASIC Design Engineer, Cloud-Scale Machine Learning Acceleration team

Senior ASIC Design Engineer position for AWS's Machine Learning Acceleration team, focusing on custom SoC design and optimization.

Applied Scientist, AWS SAAR

Senior Applied Scientist role at AWS focusing on machine learning and security analytics, developing innovative solutions for cloud security services.

Description For Software Development Engineer - AI/ML, AWS Neuron Apps

AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machine learning accelerators and the Trn1 and Inf1 servers that use them. This role is for a software engineer in the Machine Learning Applications (ML Apps) team for AWS Neuron. You will be responsible for development, enablement, and performance tuning of a wide variety of ML model families, including massive scale large language models like Llama2, GPT2, GPT3, and beyond, as well as stable diffusion, Vision Transformers, and many more.

Key responsibilities include:

  • Leading efforts to build distributed inference support into PyTorch, TensorFlow using XLA, and the Neuron compiler and runtime stacks
  • Tuning models to ensure highest performance and maximize efficiency on AWS Trainium and Inferentia silicon and the TRn1, Inf1 servers
  • Designing and coding solutions to drive efficiencies in software architecture
  • Creating metrics, implementing automation, and resolving root causes of software defects
  • Building high-impact solutions for a large customer base
  • Participating in design discussions, code reviews, and communicating with internal and external stakeholders
  • Working cross-functionally to drive business decisions with technical input

The ideal candidate will have strong software development skills using C++/Python and deep ML knowledge. Experience optimizing inference performance for both latency and throughput on large models using Python, PyTorch, or JAX is essential. Familiarity with DeepSpeed and other distributed inference libraries is crucial.

You'll be working in a startup-like development environment, always focusing on the most important tasks. The team is dedicated to supporting new members, with a mix of experience levels and tenures. They celebrate knowledge-sharing and mentorship, with senior members providing one-on-one mentoring and thorough code reviews.

Join AWS Neuron and be at the forefront of cloud-scale machine learning acceleration!

Last updated 2 months ago

Responsibilities For Software Development Engineer - AI/ML, AWS Neuron Apps

  • Develop and enable ML model families, including large language models
  • Build distributed inference support into PyTorch, TensorFlow using XLA
  • Tune models for performance on AWS Trainium and Inferentia silicon
  • Design and code solutions for software architecture efficiency
  • Create metrics and implement automation
  • Resolve root causes of software defects
  • Participate in design discussions and code reviews
  • Communicate with internal and external stakeholders
  • Work cross-functionally to drive business decisions

Requirements For Software Development Engineer - AI/ML, AWS Neuron Apps

Python
  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture experience
  • Experience programming with at least one software programming language
  • Strong software development skills using C++/Python
  • Deep ML knowledge
  • Experience optimizing inference performance for large models using Python, PyTorch, or JAX
  • Familiarity with DeepSpeed and other distributed inference libraries

Benefits For Software Development Engineer - AI/ML, AWS Neuron Apps

Medical Insurance
401k
Equity
  • Medical Insurance
  • 401k
  • Equity

Interested in this job?