Software Development Engineer, ML Infrastructure Team

Amazon Web Services (AWS) subsidiary focused on building software and hardware for machine learning and cloud computing solutions.
$129,300 - $223,600
Machine Learning
Mid-Level Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS

Description For Software Development Engineer, ML Infrastructure Team

Join AWS's Machine Learning Infrastructure team as a Software Development Engineer at Annapurna Labs, an AWS subsidiary dedicated to revolutionizing ML and HPC capabilities in the cloud. This role focuses on building and maintaining critical infrastructure that monitors and optimizes massive testing workloads at scale.

As a key member of the team, you'll work with cutting-edge technologies like AWS Trainium, Neuron, and Elastic Fabric Adapter (EFA). Your responsibilities include developing CI/CD automation, implementing ML and HPC benchmarks, and creating sophisticated monitoring systems using AWS Managed Grafana and Athena.

The position offers an opportunity to work with TypeScript and CDK for infrastructure as code, manage SLURM-based scheduling systems, and develop innovative solutions for cluster management. You'll be part of a team that's laser-focused on making AWS the most cost-effective platform for AI at scale.

The role combines software engineering excellence with ML infrastructure expertise, requiring strong skills in Python, TypeScript, and Linux systems. You'll work in Seattle, WA, with a competitive salary range of $129,300 to $223,600, depending on experience and location.

This is an ideal position for someone who enjoys working with cutting-edge ML technologies, has a passion for automation and infrastructure, and wants to impact how AI workloads are deployed at scale. You'll be part of an innovative team that's directly influencing the future of machine learning in the cloud.

Last updated 36 minutes ago

Responsibilities For Software Development Engineer, ML Infrastructure Team

  • Build and maintain infrastructure for monitoring and reporting on functionality and performance of testing workloads
  • Automate software delivery using internal Amazon CI/CD tools
  • Develop Python code for managing large clusters and running ML/HPC workloads
  • Create and maintain dashboards using AWS Managed Grafana and Athena
  • Implement automatic regression detection mechanisms
  • Manage complex infrastructure across multiple instance types and software stacks
  • Design and implement infrastructure as code using Typescript and CDK
  • Optimize cluster scheduling using SLURM and Active Directory

Requirements For Software Development Engineer, ML Infrastructure Team

Python
TypeScript
Linux
Kubernetes
  • 3+ years of non-internship professional software development experience
  • 2+ years of design or architecture experience
  • Experience programming with at least one software programming language
  • Experience developing highly automated CI/CD pipelines (Jenkins preferred)
  • Proficiency working with Linux, including Containers
  • Experience with Clustered ML or HPC Applications or Benchmarks
  • Experience coding in Python, Typescript, CDK
  • Experience creating automated dashboards and visualization (such as Grafana)

Benefits For Software Development Engineer, ML Infrastructure Team

Medical Insurance
401k
  • Full range of medical benefits
  • Financial benefits including 401k
  • Equity compensation
  • Sign-on payments
  • Comprehensive employee benefits package

Interested in this job?

Jobs Related To Amazon Software Development Engineer, ML Infrastructure Team

Software Development Engineer (ML), AGI Customization

ML Engineer position at Amazon's AGI team focusing on LLM development, fine-tuning, and model optimization, offering competitive compensation and growth opportunities.

Machine Learning Engineer - Automated Optical Inspection, Center for Quantum Computing

ML Engineer role at AWS Center for Quantum Computing, focusing on optical defect detection models and ML-driven solutions for quantum computing manufacturing.

DFT Design Engineer, AWS Machine Learning Acceleration

Design and optimize hardware for AWS data centers as a DFT Design Engineer, focusing on implementing state-of-the-art Design for Test architectures within AWS Machine Learning Acceleration team.

Software Development Engineer II, P13N

Software Development Engineer II position at Amazon's Personalization team, building ML-powered product understanding solutions with competitive compensation range of $129K-$223K.

Software Development Engineer - Sponsored Products, Demand Utilization Entity Sourcing Delivery

Machine Learning Software Engineer role at Amazon Advertising, focusing on high-throughput search systems and ads matching, combining technical expertise with leadership responsibilities.