NVIDIA is seeking a Senior HPC DevOps Engineer to contribute to building next-generation supercomputers and HPC clusters. This role is at the intersection of artificial intelligence and GPU computing, where you'll drive breakthrough innovations in at-scale system design. You'll work with cutting-edge Accelerated computing and Deep Learning platforms, collaborating with scientific researchers, developers, and customers to enhance workflows and develop innovative solutions.
The position involves designing and maintaining large-scale HPC/AI clusters, implementing infrastructure as code, and developing automated CI/CD pipelines. You'll be responsible for creating automation scripts, deploying monitoring solutions, and performing complex troubleshooting from bare metal to application level. As a technical leader, you'll share best practices and drive innovation through R&D activities.
The ideal candidate brings 5+ years of experience with a strong background in HPC and AI technologies, including expertise in CPUs, GPUs, and high-speed interconnects. You should be proficient in programming, familiar with tools like Jenkins and Ansible, and have deep knowledge of both Windows and Linux environments. Experience with job scheduling, storage solutions, and cloud platforms is essential.
NVIDIA offers a competitive package and a diverse, inclusive work environment. You'll be part of a company that's revolutionizing industries through AI and High-Performance Computing, working with the latest technologies and brilliant minds in the field. This role provides an opportunity to shape the future of computing while working on some of the most challenging technical problems in the industry.