NVIDIA, the world leader in accelerated computing, is seeking a Senior AI Cluster Tools Developer to join their sophisticated analysis and debugging tools team. This role is crucial in empowering NVIDIA engineers to improve performance and power efficiency of their products and applications. The position involves developing tools for GPU Cluster users and administrators, working directly with Architecture and Software teams.
The role focuses on building internal performance/power profiling tools and platforms for AI workloads at cluster scale, along with developing debugging tools for GPU clusters. You'll collaborate with users to build and calibrate performance/power models for next-generation hardware and partner with architects to propose and improve hardware features based on real-world use cases.
The ideal candidate should have strong software development experience with Python/Go/C++, deep understanding of AI frameworks, and knowledge of cluster management systems. You'll be working in a dynamic environment where your expertise in GPU/CPU architecture, Deep Learning application performance analysis, and large AI job troubleshooting will be highly valued.
NVIDIA offers highly competitive salaries ($148,000-$276,000) and comprehensive benefits, including equity. You'll be joining a team of some of the most brilliant and talented people in the world, working on cutting-edge technology that transforms major industries through AI and digital twins innovation.