Senior DGX Cloud Software Engineer- Infrastructure Automation and Distributed Systems

NVIDIA is the world leader in accelerated computing, pioneering solutions in AI and digital twins.
$148,000 - $276,000
Cloud
Senior Software Engineer
Remote
5+ years of experience
AI · Enterprise SaaS

Description For Senior DGX Cloud Software Engineer- Infrastructure Automation and Distributed Systems

NVIDIA, the world leader in accelerated computing, is seeking experienced Software Engineers to build and manage their private and public cloud infrastructure at production scale. This senior role focuses on infrastructure automation and distributed systems for the DGX Cloud platform. The position combines technical depth in cloud technologies with a strong emphasis on reliability and automation.

The role involves designing and implementing cloud infrastructure services, managing service level objectives, and driving automation initiatives. You'll be part of a team that ensures NVIDIA's accelerated computing infrastructure runs reliably and efficiently. Key responsibilities include system design, incident response, and consulting with peer teams on best practices.

This is an excellent opportunity for experienced engineers passionate about large-scale distributed systems and cloud infrastructure. The position offers competitive compensation ($148,000-$276,000) plus equity, and provides the flexibility of remote work. You'll be working with cutting-edge technology in AI and accelerated computing, contributing to systems that power some of the most advanced computing workloads in the industry.

NVIDIA's culture emphasizes creativity, autonomy, and technical innovation. The role requires a balance of technical expertise, collaborative skills, and strategic thinking. You'll be working with technologies like Kubernetes, Linux, and container systems, while having the opportunity to influence the architecture of next-generation cloud infrastructure.

Last updated 21 minutes ago

Responsibilities For Senior DGX Cloud Software Engineer- Infrastructure Automation and Distributed Systems

  • Design, build, and run cloud infrastructure services to meet business goals
  • Define internal facing service level objectives and error budgets
  • Eliminate or automate toil where ROI justifies it
  • Practice sustainable blameless incident prevention and response
  • Participate in oncall rotation
  • Consult with peer teams on systems design best practices

Requirements For Senior DGX Cloud Software Engineer- Infrastructure Automation and Distributed Systems

Python
Go
Kubernetes
Linux
  • BS degree in Computer Science or related technical field
  • 5+ years of relevant experience
  • Proficiency in Python or Go
  • Experience with infrastructure automation and distributed systems design
  • In-depth knowledge of Linux, Slurm, Kubernetes, Networking, Storage, and Containers
  • Track record of project initiation and collaboration
  • Strong communication skills and systematic problem-solving approach

Benefits For Senior DGX Cloud Software Engineer- Infrastructure Automation and Distributed Systems

Equity
  • Equity

Interested in this job?

Jobs Related To NVIDIA Senior DGX Cloud Software Engineer- Infrastructure Automation and Distributed Systems

Senior Software Engineer, Kubernetes - DGX Cloud

Senior Software Engineer position at NVIDIA focusing on Kubernetes development for DGX Cloud, working on GPU resource scheduling and cluster management for AI workloads.

Senior AI-HPC Storage Engineer

Senior AI-HPC Storage Engineer role at NVIDIA, focusing on designing and implementing advanced storage solutions for AI and high-performance computing environments.

Senior Software Engineer, Bare Metal Automation - DGX Cloud

Senior Software Engineer position at NVIDIA focusing on bare metal automation for DGX Cloud, managing GPU clusters and implementing monitoring systems for AI infrastructure.

Senior Cloud Platform Software Engineer

Senior Cloud Platform Engineer role at NVIDIA building scalable cloud services for AI workloads, requiring 12+ years of experience in platform engineering and expertise in Kubernetes.

Senior Software Engineer, Reliability and Operational Excellence - DGX Cloud

Senior Software Engineer position focused on reliability and operational excellence for NVIDIA's DGX Cloud platform.