HPC Operations Manager – Hardware Engineering

NVIDIA

World leader in accelerated computing, pioneering AI and digital twins technology.

San Francisco, CA, USA • Boston, MA, USA • Austin, TX, USA

$272,000 - $425,500

DevOps

Staff Software Engineer

In-Person

5,000+ Employees

15+ years of experience

AI · Enterprise SaaS

Description For HPC Operations Manager – Hardware Engineering

NVIDIA, a pioneer in GPU technology and leader in High-Performance Computing, AI, and Visualization, is seeking an HPC Operations Manager for their Hardware Engineering team. This role combines technical leadership with infrastructure management, focusing on developing and maintaining global HPC clusters crucial for GPU and SOC design. The position requires extensive experience in Linux systems, HPC environments, and team leadership, offering a competitive base salary range of $272,000 - $425,500 USD plus equity and benefits.

The role involves leading a multinational team of sysadmins and devops engineers, ensuring optimal performance of HPC clusters, and working closely with hardware engineering teams. Key responsibilities include managing HPC scheduler operations, planning hardware deployments, and driving infrastructure evolution. The ideal candidate will have 15+ years of experience, with strong backgrounds in Linux servers, NFS storage, and datacenter operations.

This is an exceptional opportunity to join one of technology's most desirable employers, working at the forefront of GPU development and AI innovation. The position offers the chance to impact future chip design through infrastructure leadership, while working with cutting-edge technologies in HPC and cloud computing. NVIDIA's commitment to diversity and innovation makes this an ideal role for experienced technical leaders looking to shape the future of computing technology.

Last updated 2 months ago

Responsibilities For HPC Operations Manager – Hardware Engineering

Lead and mentor a multi-national team of sysadmins and devops engineers
Ensure highest reliability of HPC clusters
Develop critical metrics and program schedules
Identify failures and lead retrospective analysis
Evaluate latest technologies and recommend infrastructure evolution
Plan hardware deployments and refresh
Work with hardware engineering leaders to support chip design needs
Lead all aspects of the HPC scheduler (LSF)
Track software licensing servers
Develop and manage program schedules and deliverables
Communicate program status to senior management

Requirements For HPC Operations Manager – Hardware Engineering

Linux

B.S. or M.S. in Computer Science, Computer Engineering, Information Science (or equivalent experience)
15+ years overall experience
5+ years managing IT infrastructure teams of 10+ people
10+ years experience running Linux servers, NFS storage, and Ethernet networks
Knowledge of HPC schedulers (IBM LSF preferred)
Knowledge of hardware design workflows (EDA tools and methodology)
Experience using project management and capacity planning software
Datacenter operations (rack and stack, maintenance)

Benefits For HPC Operations Manager – Hardware Engineering

Equity

Equity
Benefits package available

NVIDIA

World leader in accelerated computing, pioneering AI and digital twins technology.

San Francisco, CA, USA • Boston, MA, USA • Austin, TX, USA

$272,000 - $425,500

DevOps

Staff Software Engineer

In-Person

5,000+ Employees

15+ years of experience

AI · Enterprise SaaS

Interested in this job?

Jobs Related To NVIDIA HPC Operations Manager – Hardware Engineering

Senior Manager, Client Platform Engineering

NVIDIA

Lead NVIDIA's Client Platform Engineering team, managing end-user devices and application deployments while driving innovation in platform services.

Engineering Manager, Build and Test Quality - Autonomous Vehicles

NVIDIA

Engineering Manager position at NVIDIA leading build and test quality team for autonomous vehicles software, requiring 8+ years experience and strong technical leadership skills.

Manager, Software Verification

NVIDIA

Lead system testing and integration team role at NVIDIA focusing on cloud solutions and sophisticated product testing.

System Software Engineering Lead

NVIDIA

Lead branch health and software development processes for NVIDIA's GPU software team as a System Software Engineering Lead.

Manager, Tools and Development

NVIDIA

NVIDIA seeks a Software QA Manager to lead the Workstation QA Team, overseeing quality for RTX and Studio product lines.