NVIDIA, the world leader in accelerated computing, is seeking a Senior GPU Cluster Software Engineer to join their System Software team. This role focuses on building profiling solutions for large-scale applications running on GPU compute clusters, ensuring optimal performance and enhanced user experience. The position combines cutting-edge work in distributed systems, machine learning, and high-performance computing.
As a senior engineer, you'll be responsible for developing and maintaining profiling tools that analyze real-world ML/DL applications on HPC GPU clusters. The role requires expertise in Python development, distributed systems architecture, and database management. You'll work with state-of-the-art technology stacks including various monitoring and visualization tools like Kibana, Grafana, and modern databases.
The ideal candidate will have 5+ years of software development experience, strong understanding of distributed systems, and familiarity with machine learning concepts. This position offers the opportunity to work on meaningful projects with self-direction while providing support and mentorship for professional growth. The hybrid work environment at NVIDIA's Shanghai office allows for flexibility while maintaining collaborative opportunities.
Working at NVIDIA means being at the forefront of AI and digital twins technology, contributing to solutions that transform major industries. The role offers exposure to cutting-edge GPU technology and the chance to work with various application owners and research teams to improve current and future generation GPU clusters.