Senior AI Cluster Tools Developer

World leader in accelerated computing, pioneering AI and digital twins technology transforming major industries.
$148,000 - $276,000
Machine Learning
Senior Software Engineer
Hybrid
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Senior AI Cluster Tools Developer

NVIDIA, the world leader in accelerated computing, is seeking a Senior AI Cluster Tools Developer to join their sophisticated analysis and debugging tools team. This role is crucial in empowering NVIDIA engineers to improve performance and power efficiency of their products and applications. The position involves developing tools for GPU Cluster users and administrators, working directly with Architecture and Software teams.

The role focuses on building internal performance/power profiling tools and platforms for AI workloads at cluster scale, along with developing debugging tools for GPU clusters. You'll collaborate with users to build and calibrate performance/power models for next-generation hardware and partner with architects to propose and improve hardware features based on real-world use cases.

The ideal candidate should have strong software development experience with Python/Go/C++, deep understanding of AI frameworks, and knowledge of cluster management systems. You'll be working in a dynamic environment where your expertise in GPU/CPU architecture, Deep Learning application performance analysis, and large AI job troubleshooting will be highly valued.

NVIDIA offers highly competitive salaries ($148,000-$276,000) and comprehensive benefits, including equity. You'll be joining a team of some of the most brilliant and talented people in the world, working on cutting-edge technology that transforms major industries through AI and digital twins innovation.

Last updated 19 days ago

Responsibilities For Senior AI Cluster Tools Developer

  • Build internal performance/power profiling and analysis tools for AI workloads at cluster scale
  • Develop debugging tools for common GPU cluster problems
  • Work with users to build/calibrate performance/power models for next generation hardware
  • Partner with architects to propose new hardware features or improve existing features
  • Collaborate with Architecture and Software teams

Requirements For Senior AI Cluster Tools Developer

Python
Go
Linux
Kubernetes
  • BS+ in Computer Science or related field (or equivalent experience)
  • 5+ years of software development experience
  • Strong software design and implementation ability with Python/Go/C++
  • Good understanding of Deep Learning and AI frameworks like PyTorch, TensorFlow
  • Knowledge of AI cluster job scheduling, storage management and networking management
  • Knowledge of Linux kernel
  • Excellent problem solving skills and project management skills
  • Flexibility for working in an evolving environment

Benefits For Senior AI Cluster Tools Developer

Equity
  • Equity
  • Competitive Benefits Package

Interested in this job?

Jobs Related To NVIDIA Senior AI Cluster Tools Developer

Senior Software Engineer - Conversational AI

Senior Software Engineer position at NVIDIA focusing on building next-generation Conversational AI systems and Digital Human solutions using advanced Speech and LLM models.

Senior Software Engineer, Deep Learning Inference

Senior Software Engineer role at NVIDIA focusing on optimizing deep learning inference performance and implementing AI runtime solutions.

Senior System Software Engineer, Deep Learning Accelerator

Senior System Software Engineer role at NVIDIA focusing on Deep Learning Accelerator development, requiring 7+ years of experience in low-level software development and system architecture.

Deep Learning Engineer, End-to-end - Autonomous Driving

Senior Deep Learning Engineer position at NVIDIA focusing on end-to-end autonomous driving solutions, combining AI expertise with automotive technology.

Senior Software Engineer, TensorRT-LLM

Senior Software Engineer position at NVIDIA focusing on TensorRT-LLM development, requiring expertise in C++, deep learning, and AI inferencing optimization.