Senior Software Engineer, Kubernetes - DGX Cloud

NVIDIA is the world leader in accelerated computing, pioneering GPU technology and AI solutions.
$148,000 - $339,250
Cloud
Senior Software Engineer
Remote
5+ years of experience
AI · Enterprise SaaS

Description For Senior Software Engineer, Kubernetes - DGX Cloud

NVIDIA is seeking experienced software engineers with Kubernetes expertise to scale their AI Infrastructure. This role is part of the DGX Cloud team, focusing on production systems that enable large scalable GPU clusters for AI workloads. As a Senior Software Engineer, you'll be at the forefront of developing custom software for GPU resource scheduling on Kubernetes and implementing critical monitoring and health management capabilities.

The position offers an opportunity to work with cutting-edge technology at a company that pioneered visual computing and GPU technology. You'll be joining a team that's directly impacting the future of AI computing, working on systems that power a broad range of AI-based applications. The role combines deep technical expertise in Kubernetes and distributed systems with the excitement of working on industry-leading GPU technology.

The ideal candidate will bring 5+ years of experience in similar roles, strong software engineering principles, and expertise in Kubernetes APIs and frameworks. You'll be working in a collaborative environment, coordinating across organizational boundaries and geographies. The position offers competitive compensation ($148,000 - $339,250) plus equity, and the opportunity to work at one of technology's most desirable employers.

This is an excellent opportunity for someone passionate about Kubernetes, GPUs, and large-scale distributed systems who wants to make a significant impact in the AI computing space. You'll be part of a team that's pushing the boundaries of what's possible in AI infrastructure, working with some of the most forward-thinking professionals in the industry.

Last updated 9 days ago

Responsibilities For Senior Software Engineer, Kubernetes - DGX Cloud

  • Work on DGX Cloud team managing production systems for scalable GPU clusters
  • Implement monitoring and health management capabilities for GPU resources
  • Develop custom software for scheduling GPU resources on Kubernetes
  • Work with teams across NVIDIA to ensure production AI clusters run reliably
  • Evaluate system failures and improve services through incident management

Requirements For Senior Software Engineer, Kubernetes - DGX Cloud

Go
Python
Kubernetes
  • BS in Computer Science, Engineering, Physics, Mathematics or equivalent experience
  • 5+ years in similar role with experience on large-scale production systems
  • Direct experience in software engineering with Kubernetes APIs and frameworks
  • Strong communication skills and ability to work with cross-functional teams
  • Technical knowledge in systems programming languages (Go, Python)
  • Solid understanding of data structures and algorithms

Benefits For Senior Software Engineer, Kubernetes - DGX Cloud

Equity
  • Equity

Interested in this job?

Jobs Related To NVIDIA Senior Software Engineer, Kubernetes - DGX Cloud

Senior DGX Cloud Software Engineer- Infrastructure Automation and Distributed Systems

Senior Cloud Engineer role at NVIDIA focusing on infrastructure automation and distributed systems for DGX cloud services.

Senior AI-HPC Storage Engineer

Senior AI-HPC Storage Engineer role at NVIDIA, focusing on designing and implementing advanced storage solutions for AI and high-performance computing environments.

Senior Software Engineer, Bare Metal Automation - DGX Cloud

Senior Software Engineer position at NVIDIA focusing on bare metal automation for DGX Cloud, managing GPU clusters and implementing monitoring systems for AI infrastructure.

Senior Cloud Platform Software Engineer

Senior Cloud Platform Engineer role at NVIDIA building scalable cloud services for AI workloads, requiring 12+ years of experience in platform engineering and expertise in Kubernetes.

Senior Software Engineer, Reliability and Operational Excellence - DGX Cloud

Senior Software Engineer position focused on reliability and operational excellence for NVIDIA's DGX Cloud platform.