Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud

World leader in accelerated computing, pioneering AI and digital twins technology.
$200,000 - $391,000
Cloud
Staff Software Engineer
Remote
10+ years of experience
AI · Enterprise SaaS

Description For Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud

NVIDIA, the pioneering force behind modern AI computing, is seeking a Site Reliability Engineering leader to manage their DGX Cloud Computing operations. This role sits at the intersection of cutting-edge AI technology and cloud infrastructure, overseeing the observability platform for multi-colo distributed NVIDIA GPU cloud clusters.

The position offers an opportunity to work with world-class software engineers on NVIDIA's GPU Cloud (NGC), a GPU-accelerated platform that enables data scientists and researchers to build, train, and deploy neural network models for complex AI challenges. As a leader, you'll be responsible for all aspects of cluster operational excellence, managing a team of Site Reliability engineers, and driving technical projects in an innovative, fast-paced environment.

The role requires a strong technical background with 10+ years of engineering experience and 3+ years of leadership experience. You'll be working with cutting-edge technologies including Kubernetes, OpenStack, Docker, and observability tools like Grafana, OpenTelemetry, and Prometheus. The position offers exposure to various domains such as information retrieval, artificial intelligence, natural language processing, and distributed computing.

NVIDIA offers competitive compensation with a base salary range of $200,000 - $391,000, plus equity benefits. The company is committed to fostering a diverse work environment and values creative, autonomous engineers with a passion for technology. This role provides an exceptional opportunity to lead and influence the direction of cloud infrastructure services at one of the world's leading AI computing companies.

The ideal candidate will combine technical expertise in distributed systems and cloud infrastructure with strong leadership abilities, capable of mentoring team members while driving technical excellence. You'll be working on projects that directly impact NVIDIA's cloud computing capabilities, making this an excellent opportunity for those looking to make a significant impact in the AI and cloud computing space.

Last updated a month ago

Responsibilities For Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud

  • Manage a team of Site Reliability engineers, including task planning and code reviews
  • Define team strategy and roadmap for DGX Cloud Computing environment
  • Drive technical projects and provide leadership
  • Work closely with product management teams
  • Contribute technically to DGX Cloud Computing Services projects
  • Interact with key stakeholders for operational and financial clarity
  • Drive decision making and operational rigor across business analytics initiatives

Requirements For Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud

Linux
Python
Go
Kubernetes
  • 10+ years of Experience in engineering, 3+ years of leadership
  • Bachelor/Master degree in Computer Science or equivalent experience
  • Experience in Containers/Virtualization environments/Cluster solutions
  • Strong Knowledge in Unix/Linux
  • Experience in Perl, Python, or GoLang
  • Experience in designing and implementing large-scale distributed systems
  • Demonstrated people management and leadership skills
  • Ability to quickly learn and evaluate new technologies
  • Ability to influence and establish relationships with other software and IT functional groups

Benefits For Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud

Equity
  • Equity

Interested in this job?

Jobs Related To NVIDIA Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud

Senior Manager - Storage Production Engineering

Senior Manager role leading Storage Production Engineering team at NVIDIA, focusing on cloud storage infrastructure and SRE practices.

Senior Software and System Architect

Senior Software and System Architect role at NVIDIA, focusing on cloud-networking architecture and system design for DPUs & NICs technologies.

Applied Science Research Lab Manager

Lead the development and management of next-generation supercomputing clusters at NVIDIA, overseeing technical operations and team leadership in scientific computing research.

Technical Marketing Engineer, DGX Cloud

Technical Marketing Engineer position at NVIDIA focusing on DGX Cloud platform, combining cloud expertise with technical content creation and customer education.

Staff Software Engineer - End-User Compute Platform

Staff Software Engineer position at NVIDIA focusing on cloud desktop platform development, offering competitive salary and opportunity to work on cutting-edge technology.