Software Engineer- AI/ML, AWS Neuron

Amazon

Amazon Web Services (AWS) is a leading cloud computing platform providing scalable and reliable cloud services.

Seattle, WA, USA

$129,300 - $223,600

Machine Learning

Mid-Level Software Engineer

In-Person

5,000+ Employees

3+ years of experience

AI · Enterprise SaaS

Description For Software Engineer- AI/ML, AWS Neuron

AWS Neuron is seeking a talented Software Engineer to join their Machine Learning Applications (ML Apps) team, focusing on the complete software stack for AWS Inferentia and Trainium cloud-scale machine learning accelerators. This role presents an exciting opportunity to work at the forefront of machine learning infrastructure, specifically with massive-scale language models like LLama3, Mixtral, and DBRX.

The position involves close collaboration with chip architects, compiler engineers, and runtime engineers to develop and optimize distributed training solutions. You'll be responsible for implementing distributed training support in frameworks like PyTorch and JAX, while ensuring maximum performance on AWS Trainium and Inferentia silicon.

The ideal candidate will bring strong software development skills combined with deep ML knowledge. You'll be working in an inclusive environment that values diversity and work-life balance. Amazon offers comprehensive benefits, including medical and financial packages, and emphasizes career growth through mentorship and knowledge sharing.

The role offers competitive compensation ranging from $129,300 to $223,600 based on location and experience. You'll be part of a team that embraces Amazon's 16 Leadership Principles, including seeking diverse perspectives and earning trust. The position provides opportunities to work on cutting-edge ML infrastructure while maintaining a healthy work-life balance.

Join a team that's dedicated to supporting new members and fostering an environment of continuous learning and professional development. You'll have the chance to shape the future of machine learning infrastructure while working with some of the most advanced AI accelerator technologies in the industry.

Last updated 2 days ago

Responsibilities For Software Engineer- AI/ML, AWS Neuron

Lead efforts building distributed training support into PyTorch and JAX
Work with chip architects, compiler engineers and runtime engineers
Create, build and tune distributed training solutions with Trn1
Performance tuning of ML model families including LLMs
Maximize efficiency of models running on AWS Trainium and Inferentia silicon

Requirements For Software Engineer- AI/ML, AWS Neuron

Python

Java

3+ years of non-internship professional software development experience
2+ years of non-internship design or architecture experience
Experience programming with at least one software programming language
Experience with Python and ML model training
Knowledge of distributed training libraries like FSDP and Deepspeed

Benefits For Software Engineer- AI/ML, AWS Neuron

Medical Insurance

401k

Mental Health Assistance

Medical, financial, and other benefits
Flexible working hours
Mentorship and career growth opportunities
Employee-led affinity groups
Work-life balance focus

Amazon

Amazon Web Services (AWS) is a leading cloud computing platform providing scalable and reliable cloud services.

Seattle, WA, USA

$129,300 - $223,600

Machine Learning

Mid-Level Software Engineer

In-Person

5,000+ Employees

3+ years of experience

AI · Enterprise SaaS

Amazon

Find the contiguous subarray with the largest sum.

Data Structures & AlgorithmsMedium

You are given an array of integers. Write a function to find the contiguous subarray with the largest sum and return that sum. For example: Given the array [-2, 1, -3, 4, -1, 2, 1, -5, 4], the contiguous subarray [4, -1, 2, 1] has the largest sum = 6. Given the array [1, 2, 3, 4, 5], the contiguous subarray [1, 2, 3, 4, 5] has the largest sum = 15. Given the array [-1, -2, -3], the contiguous subarray [-1] has the largest sum = -1. Your function should be efficient and handle arrays with both positive and negative numbers. Explain the time complexity of your solution. Can you implement this using dynamic programming? Are there other approaches? Discuss the tradeoffs of the various potential solutions.

Arrays

Dynamic Programming

Amazon

How would you scale a social media platform to handle millions of users and real-time updates?

System DesignMedium

Let's explore scalability with a practical scenario. Imagine you're building a social media platform like Twitter, which needs to support millions of users who are constantly posting, liking, and re-sharing content. Consider these factors: Data Volume: Users generate terabytes of data daily. Read/Write Ratio: The platform experiences a high read-to-write ratio (many more reads than writes). Real-time Updates: Users expect to see new content and updates in real-time. Given these requirements, discuss the following: What are the key scalability challenges you anticipate? How would you approach scaling the platform's architecture to handle the increasing load, including database design, caching strategies, and load balancing? Describe specific technologies and techniques you might use to address each challenge. What are the trade-offs associated with different scaling approaches, and how would you evaluate the effectiveness of your scaling efforts?

Database Problems

Graphs

Arrays

Strings

Amazon

Work experience and current project

Behavioral

Tell me about your work experience and current project.

Arrays

Strings

Interested in this job?

Jobs Related To Amazon Software Engineer- AI/ML, AWS Neuron

Software Development Engineer II, ML_AI

Amazon

AWS SageMaker AI seeks SDE II to build next-gen AI platform, focusing on LLMs and distributed machine learning systems, offering competitive compensation and growth opportunities.

Software Development Engineer, Selling Partner Experience

Amazon

SDE role at Amazon working on AI-driven Selling Assistant, focusing on LLMs and ML technologies to revolutionize seller experience

Software Engineer- AI/ML, AWS Neuron Machine Learning Distributed Training, ML Accuracy

Amazon

AWS Neuron seeks ML Engineer to develop distributed training solutions for large language models using PyTorch, TensorFlow, and JAX on custom silicon.

Software Development Engineer, Finance Technology

Amazon

Build AI/ML applications for Amazon's finance systems, focusing on data processing, forecasting, and automation within the FinTech team.

Machine Learning Engineer, MLE II, Amazon Q in QuickSight

Amazon

Machine Learning Engineer role at Amazon working on Q in QuickSight, focusing on LLMs and NLP for business intelligence solutions.