Senior DevOps and Automation Engineer, Fabric Networking - GPU

World leader in accelerated computing, pioneering AI and digital twins technology.
$148,000 - $287,500
DevOps
Senior Software Engineer
Remote
5+ years of experience
AI · Enterprise SaaS

Description For Senior DevOps and Automation Engineer, Fabric Networking - GPU

NVIDIA, the pioneer in accelerated computing and inventor of the GPU, is seeking a Senior DevOps and Automation Engineer for their Fabric Networking - GPU team. This role is crucial in developing and maintaining software that facilitates GPU communication for High Performance Computing and Deep Learning solutions.

The position involves working with cutting-edge technology, including large GPU clusters interconnected via NVLink and InfiniBand. You'll be responsible for developing automated tools for cluster deployment, implementing modern DevOps practices, and ensuring optimal cluster performance. This role combines hands-on technical expertise with collaborative teamwork across multiple time zones.

The ideal candidate will bring strong expertise in automation tools like Ansible and Python, along with deep knowledge of Linux systems and cluster management. Experience with GPU-focused hardware and software, particularly DGX systems and Compute Clusters, would be highly valuable. The role offers exposure to groundbreaking developments in Artificial Intelligence and High-Performance Computing.

NVIDIA offers a competitive compensation package, including a base salary range of $148,000 - $287,500 USD, equity, and comprehensive benefits. This is an opportunity to join a company at the forefront of AI and accelerated computing, working on technology that powers everything from artificial intelligence to autonomous vehicles. The position offers flexibility with remote work options while being part of a team that's driving innovation in the industry.

Last updated 31 minutes ago

Responsibilities For Senior DevOps and Automation Engineer, Fabric Networking - GPU

  • Develop automated tools to deploy, provision, and maintain GPU clusters with NVLink and InfiniBand
  • Implement DevOps tools for software updates, maintenance, and cluster monitoring
  • Handle daily cluster failures and troubleshooting
  • Manage cluster software and firmware updates rollout
  • Collaborate with Engineering and Product Teams across multiple time zones

Requirements For Senior DevOps and Automation Engineer, Fabric Networking - GPU

Python
Linux
Kubernetes
  • BS or MS in Computer Science, Computer Engineering, Electrical Engineering, or related field
  • 5+ years experience in deploying and administrating clusters, servers, and infrastructure
  • Expertise in Ansible, Python and Shell Scripting
  • Deep understanding of operating systems, computer networks, and high-performance applications
  • Proven ability to work with cross-functional teams
  • Proficient with Linux fundamentals

Benefits For Senior DevOps and Automation Engineer, Fabric Networking - GPU

Equity
  • Equity

Interested in this job?

Jobs Related To NVIDIA Senior DevOps and Automation Engineer, Fabric Networking - GPU

Senior Automation Engineer - Networking

Senior Automation Engineer role at NVIDIA focusing on network automation and infrastructure management for GPU Cloud and SuperPod deployments.

Senior DevOps Engineer

Senior DevOps Engineer role at NVIDIA focusing on infrastructure development and CI/CD implementation for DPU and Network Adapters platforms.

Senior Software Engineer - Build and Deployment Tools

Senior Software Engineer position at NVIDIA focusing on build and deployment tools development, requiring 5+ years of experience in software development and DevOps.

Senior Build and Release Methodology Engineer

Senior Build and Release Methodology Engineer role at NVIDIA, focusing on developing scalable infrastructure for SOC development and IP release processes.

Senior Linux Systems Engineer

Senior Linux Systems Engineer role at NVIDIA focusing on security, containers, and HPC infrastructure development with competitive compensation and benefits.