Software Development Engineer, HPC/ML Interconnect Engineer

Amazon Web Services (AWS) is a leading cloud computing platform providing a wide range of services including compute, storage, and AI/ML solutions.
$129,300 - $223,600
Distributed Systems
Senior Software Engineer
Hybrid
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS · Cloud

Description For Software Development Engineer, HPC/ML Interconnect Engineer

We are seeking an experienced software engineer with low-level latency networking or interconnect expertise to optimize customer experience by designing systems that enable scaling network-intensive workloads over thousands of CPUs, GPUs, and TPUs. This role is on the forefront of AI/ML, focusing on optimizing networking for the latest AI workloads such as LLMs.

As part of the AWS Utility Computing (UC) organization, you'll support the development and management of various AWS services, including Compute, Database, Storage, IoT, Platform, and Productivity Apps. You'll work within Annapurna Labs, designing silicon and software that accelerates innovation for cloud solutions.

Key responsibilities:

  • Design and optimize networking solutions for Machine Learning (ML) and High-Performance Computing (HPC) workloads on AWS
  • Collaborate with cross-functional teams and engage with customers to gather feedback and improve offerings
  • Develop low-latency networking and collective operations for HPC network fabric or machine learning accelerator cluster systems
  • Troubleshoot complex networking issues and implement solutions at scale

Required skills:

  • Extensive experience in low-latency networking and collective operations
  • Proficiency in C/C++ and deep understanding of Linux and kernel-level programming
  • Strong problem-solving skills and ability to troubleshoot complex networking issues
  • Excellent communication skills for effective collaboration in a team environment

The role offers opportunities to work on cutting-edge AI/ML technologies, participate in innovative learning experiences, and benefit from a diverse and inclusive team culture. AWS values work-life balance and offers flexible working hours.

Join the Elastic Collectives team at Annapurna Labs and be part of shaping the future of networking solutions for ML and HPC workloads on AWS!

Last updated a month ago

Responsibilities For Software Development Engineer, HPC/ML Interconnect Engineer

  • Design and optimize networking solutions for ML and HPC workloads
  • Collaborate with cross-functional teams
  • Engage with customers to gather feedback and improve offerings
  • Develop low-latency networking and collective operations
  • Troubleshoot complex networking issues
  • Implement solutions at scale

Requirements For Software Development Engineer, HPC/ML Interconnect Engineer

Linux
  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture experience
  • Experience programming with at least one software programming language
  • Extensive experience in low-latency networking and collective operations
  • Proficiency in C/C++
  • Deep understanding of Linux and kernel-level programming
  • Strong problem-solving skills
  • Excellent communication skills

Benefits For Software Development Engineer, HPC/ML Interconnect Engineer

Medical Insurance
Dental Insurance
Vision Insurance
  • Flexible working hours
  • Career growth opportunities
  • Mentorship programs
  • Diverse and inclusive team culture
  • Work-life balance

Interested in this job?

Jobs Related To Amazon Software Development Engineer, HPC/ML Interconnect Engineer

Software Development Engineer, REX

Senior Software Engineer role at Amazon's REX team, focusing on distributed systems and transactional notifications platform development.

Sr. Software Development Engineer, HPC/ML Networking Engineer

Senior Software Engineer role at Amazon's Annapurna Labs, focusing on HPC/ML networking optimization and distributed systems development.

Software Dev Eng III, EC2 Networking

Senior Software Engineer role at Amazon AWS, developing network virtualization systems for EC2 VPC, offering competitive salary and growth opportunities.

Sr. Software Dev Engineer, CloudFront Media & Entertainment

Senior Software Engineer role at AWS CloudFront, building distributed systems for video delivery and content distribution at global scale.

Software Development Engineer, Amazon S3 Tables

Senior Software Engineer role at AWS S3 building large-scale distributed storage systems with focus on durability and availability of key-value metadata.