Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud

World leader in accelerated computing, pioneering AI and digital twins technology.
$200,000 - $385,250
Cloud
Staff Software Engineer
Remote
10+ years of experience
AI · Enterprise SaaS · Cloud

Description For Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud

NVIDIA, known as "the AI computing company," is seeking a Site Reliability Engineering leader to manage the operations of their observability platform focused on multi-colo distributed NVIDIA GPU cloud clusters. This role is part of the NVIDIA GPU Cloud (NGC) team, a GPU-accelerated platform that enables data scientists and researchers to build, train, and deploy neural network models for complex AI challenges.

The position requires a seasoned leader who will manage all aspects of cluster operational excellence and team growth. The ideal candidate should thrive in a fast-paced iterative engineering environment and have extensive experience delivering scalable distributed systems. This role involves working across various domains including information retrieval, artificial intelligence, natural language processing, distributed computing, and large-scale system design.

As a manager, you'll be responsible for guiding the team in solving reliability challenges for both internal and external-facing systems. The role offers the opportunity to work with cutting-edge technology in AI and deep learning, while leading a team of skilled engineers. You'll collaborate with product management teams, drive technical projects, and contribute to the strategic direction of DGX Cloud Computing Services.

The position offers competitive compensation ranging from $200,000 to $385,250 USD, along with equity and comprehensive benefits. NVIDIA provides an inclusive work environment and values diversity in their workforce, making this an excellent opportunity for leaders who want to make an impact in the AI and cloud computing space.

Last updated 2 days ago

Responsibilities For Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud

  • Manage a team of Site Reliability engineers, including task planning and code reviews
  • Define team strategy and roadmap for DGX Cloud Computing environment
  • Drive technical projects and provide leadership
  • Work closely with product management teams
  • Contribute technically to DGX Cloud Computing Services projects
  • Interact with key stakeholders for operational and financial clarity
  • Drive decision making and operational rigor across business analytics initiatives

Requirements For Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud

Linux
Python
Go
Kubernetes
  • 10+ years of engineering experience with 3+ years of leadership
  • Bachelor/Master degree in Computer Science or equivalent experience
  • Experience in Containers/Virtualization environments/Cluster solutions
  • Strong knowledge in Unix/Linux
  • Experience in Perl, Python, or GoLang
  • Experience in designing and implementing large-scale distributed systems
  • Demonstrated people management and leadership skills
  • Experience implementing tools, process, and internal instrumentation

Benefits For Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud

Equity
  • Equity
  • Comprehensive benefits package

Interested in this job?

Jobs Related To NVIDIA Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud

Software Engineering Manager - Cloud Infrastructure Services

NVIDIA seeks a Software Engineering Manager for Cloud Infrastructure Services to lead teams building reliable cloud services at scale.

Applied Science Research Lab Manager

NVIDIA seeks an Applied Science Research Lab Manager to lead innovative supercomputing projects in scientific computing.

Software Engineering Manager - Cloud Infrastructure Services

NVIDIA seeks an experienced Software Engineering Manager to lead teams building reliable cloud infrastructure services at scale.

Solutions Architect - AI and HPC Cloud

NVIDIA seeks a Solutions Architect for AI and HPC Cloud to design and implement advanced infrastructure solutions, collaborating with product teams and solving complex deployment challenges.

Staff Software Engineer, Cloud Infrastructure

Staff Software Engineer position at Airbnb focusing on cloud infrastructure, offering remote work and competitive compensation of $204-254K.