Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud

NVIDIA

World leader in accelerated computing, pioneering AI and digital twins technology.

San Francisco, CA, USA

$200,000 - $391,000

Cloud

Staff Software Engineer

Remote

10+ years of experience

AI · Enterprise SaaS

Description For Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud

NVIDIA, the pioneering force behind modern AI computing, is seeking a Site Reliability Engineering leader to manage their DGX Cloud Computing operations. This role sits at the intersection of cutting-edge AI technology and cloud infrastructure, overseeing the observability platform for multi-colo distributed NVIDIA GPU cloud clusters.

The position offers an opportunity to work with world-class software engineers on NVIDIA's GPU Cloud (NGC), a GPU-accelerated platform that enables data scientists and researchers to build, train, and deploy neural network models for complex AI challenges. As a leader, you'll be responsible for all aspects of cluster operational excellence, managing a team of Site Reliability engineers, and driving technical projects in an innovative, fast-paced environment.

The role requires a strong technical background with 10+ years of engineering experience and 3+ years of leadership experience. You'll be working with cutting-edge technologies including Kubernetes, OpenStack, Docker, and observability tools like Grafana, OpenTelemetry, and Prometheus. The position offers exposure to various domains such as information retrieval, artificial intelligence, natural language processing, and distributed computing.

NVIDIA offers competitive compensation with a base salary range of $200,000 - $391,000, plus equity benefits. The company is committed to fostering a diverse work environment and values creative, autonomous engineers with a passion for technology. This role provides an exceptional opportunity to lead and influence the direction of cloud infrastructure services at one of the world's leading AI computing companies.

The ideal candidate will combine technical expertise in distributed systems and cloud infrastructure with strong leadership abilities, capable of mentoring team members while driving technical excellence. You'll be working on projects that directly impact NVIDIA's cloud computing capabilities, making this an excellent opportunity for those looking to make a significant impact in the AI and cloud computing space.

Last updated 3 months ago

Responsibilities For Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud

Manage a team of Site Reliability engineers, including task planning and code reviews
Define team strategy and roadmap for DGX Cloud Computing environment
Drive technical projects and provide leadership
Work closely with product management teams
Contribute technically to DGX Cloud Computing Services projects
Interact with key stakeholders for operational and financial clarity
Drive decision making and operational rigor across business analytics initiatives

Requirements For Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud

Linux

Python

Kubernetes

10+ years of Experience in engineering, 3+ years of leadership
Bachelor/Master degree in Computer Science or equivalent experience
Experience in Containers/Virtualization environments/Cluster solutions
Strong Knowledge in Unix/Linux
Experience in Perl, Python, or GoLang
Experience in designing and implementing large-scale distributed systems
Demonstrated people management and leadership skills
Ability to quickly learn and evaluate new technologies
Ability to influence and establish relationships with other software and IT functional groups

Benefits For Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud

Equity

Equity

NVIDIA

World leader in accelerated computing, pioneering AI and digital twins technology.

San Francisco, CA, USA

$200,000 - $391,000

Cloud

Staff Software Engineer

Remote

10+ years of experience

AI · Enterprise SaaS

Interested in this job?

Jobs Related To NVIDIA Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud

Senior Software and System Architect

NVIDIA

Senior Software and System Architect role at NVIDIA, focusing on cloud-networking architecture and system design for DPUs & NICs technologies.

Applied Science Research Lab Manager

NVIDIA

Lead the development and management of next-generation supercomputing clusters at NVIDIA, overseeing technical operations and team leadership in scientific computing research.

Technical Marketing Engineer, DGX Cloud

NVIDIA

Technical Marketing Engineer position at NVIDIA focusing on DGX Cloud platform, combining cloud expertise with technical content creation and customer education.

Staff Software Engineer - End-User Compute Platform

NVIDIA

Staff Software Engineer position at NVIDIA focusing on cloud desktop platform development, offering competitive salary and opportunity to work on cutting-edge technology.

Senior Manager - Compute Infrastructure Engineering

NVIDIA

Lead NVIDIA's Compute Infrastructure Engineering team, driving innovation in cloud, containerization, and infrastructure automation while managing critical IT services and transformational initiatives.