HPC Operations Manager – Hardware Engineering

World leader in accelerated computing, pioneering AI and digital twins technology.
$272,000 - $425,500
DevOps
Staff Software Engineer
In-Person
5,000+ Employees
15+ years of experience
AI · Enterprise SaaS

Description For HPC Operations Manager – Hardware Engineering

NVIDIA, a pioneer in GPU technology and leader in High-Performance Computing, AI, and Visualization, is seeking an HPC Operations Manager for their Hardware Engineering team. This role combines technical leadership with infrastructure management, focusing on developing and maintaining global HPC clusters crucial for GPU and SOC design. The position requires extensive experience in Linux systems, HPC environments, and team leadership, offering a competitive base salary range of $272,000 - $425,500 USD plus equity and benefits.

The role involves leading a multinational team of sysadmins and devops engineers, ensuring optimal performance of HPC clusters, and working closely with hardware engineering teams. Key responsibilities include managing HPC scheduler operations, planning hardware deployments, and driving infrastructure evolution. The ideal candidate will have 15+ years of experience, with strong backgrounds in Linux servers, NFS storage, and datacenter operations.

This is an exceptional opportunity to join one of technology's most desirable employers, working at the forefront of GPU development and AI innovation. The position offers the chance to impact future chip design through infrastructure leadership, while working with cutting-edge technologies in HPC and cloud computing. NVIDIA's commitment to diversity and innovation makes this an ideal role for experienced technical leaders looking to shape the future of computing technology.

Last updated 2 months ago

Responsibilities For HPC Operations Manager – Hardware Engineering

  • Lead and mentor a multi-national team of sysadmins and devops engineers
  • Ensure highest reliability of HPC clusters
  • Develop critical metrics and program schedules
  • Identify failures and lead retrospective analysis
  • Evaluate latest technologies and recommend infrastructure evolution
  • Plan hardware deployments and refresh
  • Work with hardware engineering leaders to support chip design needs
  • Lead all aspects of the HPC scheduler (LSF)
  • Track software licensing servers
  • Develop and manage program schedules and deliverables
  • Communicate program status to senior management

Requirements For HPC Operations Manager – Hardware Engineering

Linux
  • B.S. or M.S. in Computer Science, Computer Engineering, Information Science (or equivalent experience)
  • 15+ years overall experience
  • 5+ years managing IT infrastructure teams of 10+ people
  • 10+ years experience running Linux servers, NFS storage, and Ethernet networks
  • Knowledge of HPC schedulers (IBM LSF preferred)
  • Knowledge of hardware design workflows (EDA tools and methodology)
  • Experience using project management and capacity planning software
  • Datacenter operations (rack and stack, maintenance)

Benefits For HPC Operations Manager – Hardware Engineering

Equity
  • Equity
  • Benefits package available

Interested in this job?

Jobs Related To NVIDIA HPC Operations Manager – Hardware Engineering

Senior Manager, Client Platform Engineering

Lead NVIDIA's Client Platform Engineering team, managing end-user devices and application deployments while driving innovation in platform services.

Engineering Manager, Build and Test Quality - Autonomous Vehicles

Engineering Manager position at NVIDIA leading build and test quality team for autonomous vehicles software, requiring 8+ years experience and strong technical leadership skills.

Manager, Software Verification

Lead system testing and integration team role at NVIDIA focusing on cloud solutions and sophisticated product testing.

System Software Engineering Lead

Lead branch health and software development processes for NVIDIA's GPU software team as a System Software Engineering Lead.

Manager, Tools and Development

NVIDIA seeks a Software QA Manager to lead the Workstation QA Team, overseeing quality for RTX and Studio product lines.