Software Development Engineer, SageMaker

Amazon

World's most comprehensive and broadly adopted cloud platform, pioneering cloud computing and continuous innovation.

San Francisco, CA, USA

$129,300 - $223,600

Distributed Systems

Senior Software Engineer

Hybrid

5,000+ Employees

3+ years of experience

AI · Enterprise SaaS · Cloud

Description For Software Development Engineer, SageMaker

AWS AI is revolutionizing deep learning in the cloud through Amazon SageMaker, building customer-facing services for data scientists and software engineers. As customers increasingly adopt LLMs and Generative AI, we're developing a next-generation AI platform optimized for LLMs and distributed training.

The role focuses on the SageMaker HyperPod team, where you'll design, develop, and deploy distributed machine learning systems for worldwide customers. You'll work closely with ML scientists and customers to shape strategy and define roadmaps, while translating requirements into technical specifications for scalable solutions.

Key responsibilities include:

Developing innovative solutions for Large Language Model training across node clusters
Building and maintaining performant, resilient services for training large-scale foundation models
Optimizing distributed training through performance profiling and bottleneck resolution
Leading complex projects and serving as a technical resource throughout development
Mentoring junior engineers and driving best practices

The ideal candidate brings:

Strong background in large-scale software systems
Experience with multi-threaded asynchronous C++/Go development
Knowledge of Kubernetes, high-performance computing, and large language model training
Passion for building platforms handling 100+ billion parameter GPT models across 1000s of GPU devices

Benefits include:

Flexible hybrid work options
Comprehensive mentorship and career growth opportunities
Inclusive team culture with employee-led affinity groups
Work-life harmony focus
Competitive compensation package including equity and benefits

Join AWS to have a significant impact on cloud computing and serve customers worldwide while working with cutting-edge AI technology.

Last updated 6 days ago

Responsibilities For Software Development Engineer, SageMaker

Develop solutions for Large Language Model training in node clusters
Build and maintain services for training large-scale foundation models
Optimize distributed training performance
Lead complex technical projects
Mentor junior engineers
Collaborate with ML scientists and customers
Define technical specifications and system architecture

Requirements For Software Development Engineer, SageMaker

Kubernetes

3+ years of non-internship professional software development experience
2+ years of system design and architecture experience
Experience with at least one programming language
Experience with multi-threaded asynchronous C++/Go development
Knowledge of Kubernetes and high-performance computing
Experience in large language model training

Benefits For Software Development Engineer, SageMaker

Medical Insurance

Dental Insurance

Vision Insurance

Parental Leave

Education Budget

Flexible hybrid work options
Comprehensive health benefits
Career development and mentorship
Competitive base salary
Equity compensation
Work-life harmony

Amazon

World's most comprehensive and broadly adopted cloud platform, pioneering cloud computing and continuous innovation.

San Francisco, CA, USA

$129,300 - $223,600

Distributed Systems

Senior Software Engineer

Hybrid

5,000+ Employees

3+ years of experience

AI · Enterprise SaaS · Cloud

Amazon

Unique Paths

Data Structures & AlgorithmsMedium

There is a robot on an m x n grid. The robot is initially located at the top-left corner (i.e., grid0). The robot tries to move to the bottom-right corner (i.e., gridm - 1). The robot can only move either down or right at any point in time. Given the two integers m and n, return the number of possible unique paths that the robot can take to reach the bottom-right corner. The test cases are generated so that the answer will be less than or equal to 2 * 109. For example, if m = 3 and n = 7, the expected output is 28. Explain your approach, provide code, and analyze the time and space complexity. Also discuss edge cases and alternative solutions using combinatorics.

Dynamic Programming

Amazon

Describe the system design of your Salesforce integration with a Java backend.

System DesignMedium

Let's discuss the system design of your existing Salesforce integration with a Java backend. Could you please elaborate on the architecture, including specific details about data flow, integration patterns, and technologies used? For example, walk me through a scenario where a new account is created in Salesforce, and how that information propagates to your Java backend system. What mechanisms are in place for real-time updates versus batch processing? How do you handle data transformations and ensure data integrity across both systems? What are the key considerations for scalability, security, and fault tolerance in your design? Finally, describe the monitoring and alerting mechanisms you have implemented to ensure the integration is running smoothly and to quickly identify and address any issues that may arise. Provide details on the error handling strategies employed to prevent data loss or corruption in the event of failures.

Graphs

Database Problems

Dynamic Programming

Arrays

Amazon

Tell me about yourself

Behavioral

Tell me about yourself. To help me understand your background, please structure your response by including the following: A brief overview of your educational background: Where did you go to school, what did you study, and what were your key areas of focus or specializations? A summary of your professional experience: Walk me through your previous roles, highlighting the companies you worked for, your responsibilities, and the significant projects you contributed to. Be sure to mention any accomplishments or quantifiable results you achieved. Skills and areas of expertise: What are your strongest technical skills and areas of expertise? Provide specific examples of how you've applied these skills in previous projects or roles. For instance, if you're proficient in Python, describe a project where you used Python to solve a complex problem. Your motivations and interests: What are you passionate about in the field of technology, and what motivates you to pursue a career in this area? What specific aspects of software engineering or your chosen field do you find most exciting or rewarding? Your career goals: Where do you see yourself in the next 3-5 years? What are your short-term and long-term career aspirations, and how does this role align with your overall goals? Are you looking to specialize in a particular area, take on leadership responsibilities, or make a significant impact in a specific industry? By providing this structured overview, I hope to gain a better understanding of your background, skills, and aspirations, and how they align with the goals and requirements of this position.

Interested in this job?

Jobs Related To Amazon Software Development Engineer, SageMaker

Sr. Software Dev Engineer, Kuiper Software & Networking

Amazon

Senior Software Engineer role at Amazon's Project Kuiper developing distributed systems for satellite communications

Software Development Engineer, EC2 Instance Networking

Amazon

Senior Software Engineer role at Amazon AWS working on EC2 VPC Dataplane team, focusing on high-performance networking and distributed systems.

Sr Software Development Engineer, AWS Elastic Block Store

Amazon

Senior Software Engineer role at AWS Elastic Block Store team, building and managing large-scale distributed storage systems for cloud computing.

Sr. Comm Systems Engineer, KGS Integrated Systems Solutions

Amazon

Senior Communications Systems Engineer role at Amazon's Project Kuiper, focusing on satellite communications systems and RF engineering.

Software Development Engineer II, DynamoDB Border Services - Connectivity

Amazon

Senior Software Engineer position at AWS working on DynamoDB Border Services team, focusing on distributed systems and large-scale database services.