Software Development Engineer II, AWS SageMaker Training

World's most comprehensive and broadly adopted cloud platform, pioneering cloud computing and continuous innovation.
$120,000 - $200,000
Machine Learning
Mid-Level Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS · Cloud

Description For Software Development Engineer II, AWS SageMaker Training

AWS AI is seeking a dedicated Software Development Engineer II to join their SageMaker Training team, focusing on building next-generation AI compute platforms optimized for LLMs and distributed training. This role sits within AWS Utility Computing (UC), which provides foundational services like S3 and EC2, along with continuous product innovations. The position involves working with cutting-edge AI technology, specifically designed for large-scale deep learning model training handling 100+ billion parameter GPT models across thousands of GPU devices.

The role offers an exciting opportunity to work at the forefront of AI technology, collaborating with ML scientists and customers to shape the future of cloud-based AI training solutions. You'll be responsible for designing and developing distributed machine learning systems that serve AWS's worldwide customer base. The position requires strong technical abilities in software development, particularly with multi-threaded asynchronous programming and experience with resource orchestrators or high-performance computing.

AWS values diverse experiences and maintains an inclusive culture through employee-led affinity groups and ongoing learning opportunities. The company offers strong career growth potential through mentorship and knowledge-sharing programs. Work-life harmony is emphasized, with flexibility built into the working culture. This role presents an opportunity to have significant impact on AWS and its global customer base while working with cutting-edge AI technology and contributing to innovative solutions in cloud computing.

Last updated 3 minutes ago

Responsibilities For Software Development Engineer II, AWS SageMaker Training

  • Design, develop, test, and deploy distributed machine learning systems
  • Build and improve next-generation AI platform
  • Collaborate with internal engineering teams and leading technology companies
  • Create innovative products to run at scale on the AI platform
  • Drive system architecture and spearhead best practices
  • Coach and develop junior engineers
  • Gather and analyze business and functional requirements
  • Translate requirements into technical specifications

Requirements For Software Development Engineer II, AWS SageMaker Training

Python
Go
Kubernetes
  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture experience
  • Experience programming with at least one software programming language
  • Experience in multi-threaded asynchronous C++ or Go development
  • Experience in resource orchestrators like slurm/kubernetes, high performance computing, or large language model training

Benefits For Software Development Engineer II, AWS SageMaker Training

Medical Insurance
Visa Sponsorship
  • Work-life harmony
  • Career development opportunities
  • Mentorship programs
  • Inclusive work culture
  • Employee-led affinity groups
  • Ongoing learning experiences
  • Disability workplace accommodations

Interested in this job?

Jobs Related To Amazon Software Development Engineer II, AWS SageMaker Training

Software Dev Engineer II, Conversational Ad Experiences

Software Dev Engineer role at Amazon focusing on conversational AI advertising experiences, building ML infrastructure and implementing LLM models in production.

Machine Learning Engineer, Generative AI Innovation Center - Model Customization

Machine Learning Engineer role at AWS's Generative AI Innovation Center, focusing on LLM development, model optimization, and customer collaboration.

Software Development Engineer - AI/ML, AWS Neuron, Multimodal Inference

AWS Neuron seeks ML Software Engineer to optimize distributed inference solutions for cloud-scale machine learning accelerators, working with cutting-edge AI models and technologies.

ML Software Engineer, Robotics AI

ML Software Engineer position at Amazon Robotics, focusing on developing computer vision systems and ML/AI models for robotic platforms.

Software Engineer II, Annapurna Labs ML Acceleration Management Controller

Software Engineer II position at Annapurna Labs focusing on ML server management software development for AWS infrastructure.