Senior Software Engineer, Kubernetes - DGX Cloud

NVIDIA is the world leader in accelerated computing, pioneering GPU technology and AI solutions.
$148,000 - $339,250
Cloud
Senior Software Engineer
Remote
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Senior Software Engineer, Kubernetes - DGX Cloud

NVIDIA, the pioneer in visual computing and GPU technology, is seeking a Senior Software Engineer specialized in Kubernetes for their DGX Cloud team. This role is at the forefront of AI infrastructure development, focusing on scaling and optimizing GPU-powered computing systems. The position offers an opportunity to work with cutting-edge technology in AI computing, combining expertise in Kubernetes with GPU resource management.

The role involves developing and maintaining large-scale GPU clusters for AI workloads, implementing sophisticated monitoring systems, and ensuring optimal performance of production environments. You'll be working with kubernetes APIs and frameworks, developing custom scheduling solutions for GPU resources, and implementing robust health management capabilities.

The ideal candidate brings 5+ years of experience in similar roles, deep knowledge of Kubernetes, and strong programming skills in Go or Python. You'll be joining a company that's leading the AI computing revolution, offering competitive compensation between $148,000 - $339,250 USD, plus equity and benefits.

This position offers a unique opportunity to impact the future of AI infrastructure, working with some of the most forward-thinking professionals in the industry. You'll be part of a team that values innovation, technical excellence, and collaborative problem-solving, while contributing to systems that power cutting-edge AI applications across various industries.

NVIDIA's culture encourages creativity, autonomy, and technical innovation, making it one of the most desirable employers in the technology sector. If you're passionate about kubernetes, GPUs, and want to be at the forefront of AI infrastructure development, this role offers an exciting opportunity to make a significant impact while working with industry-leading technology.

Last updated a day ago

Responsibilities For Senior Software Engineer, Kubernetes - DGX Cloud

  • Work on DGX Cloud team responsible for production systems enabling large scalable GPU clusters for AI workloads
  • Implement monitoring and health management capabilities for GPU resources
  • Work on custom software related to scheduling GPU resources on kubernetes
  • Ensure production AI clusters run reliably and consistently with maximum performance
  • Evaluate system failures and improve services based on incident management process

Requirements For Senior Software Engineer, Kubernetes - DGX Cloud

Kubernetes
Python
Go
  • Direct experience in software engineering with kubernetes APIs and frameworks
  • Strong communication skills and ability to work with multi-functional teams
  • 5+ years experience in similar role with large-scale production systems
  • BS in Computer Science, Engineering, Physics, Mathematics or equivalent experience
  • Technical knowledge in systems programming language (Go, Python)
  • Solid understanding of data structures and algorithms

Benefits For Senior Software Engineer, Kubernetes - DGX Cloud

Equity
Medical Insurance
  • Equity
  • Comprehensive benefits package

Interested in this job?

Jobs Related To NVIDIA Senior Software Engineer, Kubernetes - DGX Cloud

Senior DGX Cloud Software Engineer- Infrastructure Automation and Distributed Systems

Senior Cloud Engineer role at NVIDIA focusing on infrastructure automation and distributed systems, offering competitive compensation and opportunity to work with cutting-edge technology.

Senior System Software Engineer - Scientific Computing PaaS

Senior System Software Engineer position at NVIDIA focusing on building scientific computing platform on DGX Cloud, requiring expertise in cloud computing and distributed systems.

Senior Software Engineer, Reliability and Operational Excellence - DGX Cloud

Senior Software Engineer position at NVIDIA focusing on reliability and operational excellence for DGX Cloud services.

Senior Software Engineer, Bare Metal Automation - DGX Cloud

Senior Software Engineer position at NVIDIA focusing on bare metal automation for DGX Cloud, managing large-scale GPU clusters for AI workloads.

Senior Software Engineer - HPC

Senior Software Engineer position at NVIDIA focusing on HPC infrastructure, requiring 10+ years of experience in designing and implementing large-scale distributed systems.