NVIDIA is seeking an experienced Senior HPC DevOps Engineer to help build the supercomputers and HPC clusters of the future. This role is crucial in driving groundbreaking advancements in artificial intelligence and GPU computing. As a key player, you'll work with cutting-edge Accelerated computing and Deep Learning platforms, collaborating with scientific researchers, developers, and customers to improve workflows and develop innovative solutions.
The position involves designing and maintaining large-scale HPC/AI clusters, implementing infrastructure as code, and developing CI/CD pipelines. You'll be responsible for automation, monitoring, and troubleshooting complex systems from bare metal to application level. The role requires expertise in HPC and AI technologies, programming languages, and various tools including Jenkins, Ansible, and Kubernetes.
The ideal candidate will have at least 5 years of experience, strong technical background in computer science or engineering, and deep knowledge of both Windows and Linux environments. Experience with cloud platforms, virtual systems, and storage solutions is essential. Knowledge of GPU architecture, container technologies, and RDMA fabrics would be particularly valuable.
At NVIDIA, you'll be part of a team pushing technology boundaries and making real-world impact. The company values diversity and inclusion, providing an environment where innovation thrives. This role offers the opportunity to work with state-of-the-art technology while contributing to the future of computing and artificial intelligence.