Software Development Engineer, SageMaker

World's most comprehensive and broadly adopted cloud platform, pioneering cloud computing and continuous innovation.
$129,300 - $223,600
Distributed Systems
Senior Software Engineer
Hybrid
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS · Cloud

Description For Software Development Engineer, SageMaker

AWS AI is revolutionizing deep learning in the cloud through Amazon SageMaker, building customer-facing services for data scientists and software engineers. As customers increasingly adopt LLMs and Generative AI, we're developing a next-generation AI platform optimized for LLMs and distributed training.

The role focuses on the SageMaker HyperPod team, where you'll design, develop, and deploy distributed machine learning systems for worldwide customers. You'll work closely with ML scientists and customers to shape strategy and define roadmaps, while translating requirements into technical specifications for scalable solutions.

Key responsibilities include:

  • Developing innovative solutions for Large Language Model training across node clusters
  • Building and maintaining performant, resilient services for training large-scale foundation models
  • Optimizing distributed training through performance profiling and bottleneck resolution
  • Leading complex projects and serving as a technical resource throughout development
  • Mentoring junior engineers and driving best practices

The ideal candidate brings:

  • Strong background in large-scale software systems
  • Experience with multi-threaded asynchronous C++/Go development
  • Knowledge of Kubernetes, high-performance computing, and large language model training
  • Passion for building platforms handling 100+ billion parameter GPT models across 1000s of GPU devices

Benefits include:

  • Flexible hybrid work options
  • Comprehensive mentorship and career growth opportunities
  • Inclusive team culture with employee-led affinity groups
  • Work-life harmony focus
  • Competitive compensation package including equity and benefits

Join AWS to have a significant impact on cloud computing and serve customers worldwide while working with cutting-edge AI technology.

Last updated 6 days ago

Responsibilities For Software Development Engineer, SageMaker

  • Develop solutions for Large Language Model training in node clusters
  • Build and maintain services for training large-scale foundation models
  • Optimize distributed training performance
  • Lead complex technical projects
  • Mentor junior engineers
  • Collaborate with ML scientists and customers
  • Define technical specifications and system architecture

Requirements For Software Development Engineer, SageMaker

Go
Kubernetes
  • 3+ years of non-internship professional software development experience
  • 2+ years of system design and architecture experience
  • Experience with at least one programming language
  • Experience with multi-threaded asynchronous C++/Go development
  • Knowledge of Kubernetes and high-performance computing
  • Experience in large language model training

Benefits For Software Development Engineer, SageMaker

Medical Insurance
Dental Insurance
Vision Insurance
Parental Leave
Education Budget
  • Flexible hybrid work options
  • Comprehensive health benefits
  • Career development and mentorship
  • Competitive base salary
  • Equity compensation
  • Work-life harmony

Interested in this job?

Jobs Related To Amazon Software Development Engineer, SageMaker

Sr. Software Dev Engineer, Kuiper Software & Networking

Senior Software Engineer role at Amazon's Project Kuiper developing distributed systems for satellite communications

Software Development Engineer, EC2 Instance Networking

Senior Software Engineer role at Amazon AWS working on EC2 VPC Dataplane team, focusing on high-performance networking and distributed systems.

Sr Software Development Engineer, AWS Elastic Block Store

Senior Software Engineer role at AWS Elastic Block Store team, building and managing large-scale distributed storage systems for cloud computing.

Sr. Comm Systems Engineer, KGS Integrated Systems Solutions

Senior Communications Systems Engineer role at Amazon's Project Kuiper, focusing on satellite communications systems and RF engineering.

Software Development Engineer II, DynamoDB Border Services - Connectivity

Senior Software Engineer position at AWS working on DynamoDB Border Services team, focusing on distributed systems and large-scale database services.