Senior HPC DevOps Engineer

NVIDIA is the world leader in accelerated computing, pioneering solutions in AI and digital twins.
DevOps
Senior Software Engineer
In-Person
5+ years of experience
AI · Enterprise SaaS

Description For Senior HPC DevOps Engineer

NVIDIA is seeking an experienced Senior HPC DevOps Engineer to help build the supercomputers and HPC clusters of the future. This role is crucial in driving groundbreaking advancements in artificial intelligence and GPU computing. As a key player, you'll work with cutting-edge Accelerated computing and Deep Learning platforms, collaborating with scientific researchers, developers, and customers to improve workflows and develop innovative solutions.

The position involves designing and maintaining large-scale HPC/AI clusters, implementing infrastructure as code, and developing CI/CD pipelines. You'll be responsible for automation, monitoring, and troubleshooting complex systems from bare metal to application level. The role requires expertise in HPC and AI technologies, programming languages, and various tools including Jenkins, Ansible, and Kubernetes.

The ideal candidate will have at least 5 years of experience, strong technical background in computer science or engineering, and deep knowledge of both Windows and Linux environments. Experience with cloud platforms, virtual systems, and storage solutions is essential. Knowledge of GPU architecture, container technologies, and RDMA fabrics would be particularly valuable.

At NVIDIA, you'll be part of a team pushing technology boundaries and making real-world impact. The company values diversity and inclusion, providing an environment where innovation thrives. This role offers the opportunity to work with state-of-the-art technology while contributing to the future of computing and artificial intelligence.

Last updated 3 days ago

Responsibilities For Senior HPC DevOps Engineer

  • Design, implement, and maintain large-scale HPC/AI clusters
  • Utilize and develop tools to manage infrastructure as code
  • Develop and maintain CI/CD pipelines
  • Develop automation scripts and tools
  • Deploy advanced monitoring solutions
  • Perform comprehensive troubleshooting
  • Serve as a technical resource
  • Support R&D activities and engage in POCs and POVs

Requirements For Senior HPC DevOps Engineer

Linux
Kubernetes
  • B.Sc. in Computer Science, Engineering, or a related field with 5+ years of experience
  • Deep knowledge of HPC and AI solution technologies
  • Advanced proficiency in programming and scripting languages
  • Familiarity with Jenkins, Ansible, Puppet/Chef
  • Excellent knowledge of Windows and Linux
  • Deep understanding of networking protocols
  • Experience with job scheduling workloads and orchestration tools
  • Experience with multiple storage solutions
  • Expertise with virtual systems
  • Familiarity with cloud platforms

Interested in this job?

Jobs Related To NVIDIA Senior HPC DevOps Engineer

Senior DevOps Engineer

Senior DevOps Engineer role at NVIDIA, leading CI/CD infrastructure development and automation, offering competitive salary and opportunity to work with cutting-edge AI technology.

Senior DevOps Engineer - AI Infrastructure

Senior DevOps Engineer position at NVIDIA focusing on AI infrastructure and autonomous vehicle systems, requiring expertise in cloud technologies and automation.

Senior DevOps and Automation Engineer, Fabric Networking - GPU

Senior DevOps role at NVIDIA focusing on GPU cluster management, automation, and infrastructure development for high-performance computing systems.

Senior CUDA Driver, Legate, and Build Engineer

Senior DevOps role at NVIDIA focusing on CUDA driver development and build system automation, offering competitive compensation and opportunity to work with cutting-edge technology.

Senior Enterprise Software Test Development Engineer

Senior Enterprise Software Test Development Engineer role at NVIDIA, focusing on automation, DevOps, and quality assurance for enterprise server platforms.