Software Engineering Manager - GPU Communications Libraries

NVIDIA is the world leader in accelerated computing, pioneering solutions to tackle challenges no one else can solve.
$180,000 - $339,250
Distributed Systems
Principal Software Engineer
In-Person
10+ years of experience
AI · Enterprise SaaS

Description For Software Engineering Manager - GPU Communications Libraries

We are the GPU Communications Libraries and Networking team at NVIDIA. We deliver communication libraries like NCCL, NVSHMEM, UCX for Deep Learning and HPC. DL and HPC applications have a huge compute demand already and run on scales which go up to tens of thousands of GPUs. The GPUs are connected with high-speed interconnects (eg. NVLink, PCIe) within a node and with high-speed networking (eg. Infiniband, Ethernet) across the nodes.

Communication performance between the GPUs has a direct impact on the end-to-end application performance; and the stakes are even higher at huge scales! We are looking for a technical leader to manage our NVSHMEM and UCX libraries. This is an outstanding opportunity to push the limits on the state-of-the-art and deliver platforms the world has never seen before.

What you will be doing:

  • Lead, mentor, and grow your library engineering team and be responsible for the planning and execution of projects as well as the quality, and performance of your libraries.
  • Participate in feature design and implementation.
  • Interact with internal and external partners and researchers to understand their use cases and requirements.
  • Collaborate with engineering teams, program and product management, and partners to define the product roadmap.
  • Continuously review and identify improvement opportunities in established processes, infrastructure, and practices.

What we need to see:

  • 10+ overall years of experience in the software industry with specialization in HPC networking or system software.
  • 4+ years of management experience.
  • BS, MS, or Ph.D. in CS, CE, EE (related technical field) or equivalent experience.
  • Prior systems software or communication runtime or high performance networking software development experience.
  • Strong understanding of computer system architecture, operating systems principles, HW-SW interactions and performance analysis/optimizations.
  • Excellent C/C++ programming and debugging skills in Linux.
  • Experience balancing multiple projects with competing priorities.
  • Flexibility to work and communicate effectively across different teams and timezones.

Ways to stand out:

  • Experience with parallel programming models (MPI, SHMEM) and communication runtimes.
  • Background with RDMA, high-performance networking technologies, and network architecture.
  • Experience with Deep Learning Frameworks such as PyTorch, TensorFlow, etc.

NVIDIA offers a diverse, supportive environment where everyone is inspired to do their best work. Join the team and make a lasting impact on the world.

Last updated 2 months ago

Responsibilities For Software Engineering Manager - GPU Communications Libraries

  • Lead and mentor library engineering team
  • Plan and execute projects
  • Participate in feature design and implementation
  • Interact with partners to understand requirements
  • Collaborate on product roadmap
  • Identify improvement opportunities in processes and infrastructure

Requirements For Software Engineering Manager - GPU Communications Libraries

Linux
  • 10+ years of experience in software industry with HPC networking or system software
  • 4+ years of management experience
  • BS, MS, or Ph.D. in CS, CE, EE or related field
  • Experience in systems software or communication runtime development
  • Strong understanding of computer system architecture and OS principles
  • Excellent C/C++ programming and debugging skills in Linux
  • Ability to balance multiple projects
  • Effective communication across teams and timezones

Benefits For Software Engineering Manager - GPU Communications Libraries

Equity
  • Equity
  • Benefits

Interested in this job?

Jobs Related To NVIDIA Software Engineering Manager - GPU Communications Libraries

Senior Software Architect - Deep Learning and HPC Communications

Senior Software Architect position at NVIDIA focusing on Deep Learning and HPC Communications, developing scalable solutions for GPU-based systems.

Senior Software Architect, Accelerated Computing

Senior Software Architect position at NVIDIA focusing on AI Cloud architecture and HPC networks, offering competitive salary and opportunity to work with cutting-edge technology.

Senior Software Architect, Advanced Development

NVIDIA seeks a Senior Software Architect for Advanced Development to design innovative solutions in network programmability and data center technologies.

Principal Software Engineer - Autonomous Vehicles

Principal Software Engineer role for Autonomous Vehicles at NVIDIA, focusing on 3D world modeling and fusion of perception and mapping signals.

Senior Software Architect, AI and HPC

NVIDIA seeks a Senior Software Architect for AI and HPC to innovate in parallel programming, network architecture, and hardware development.