Senior System Software Engineer, Distributed Systems - DGX Cloud

NVIDIA is the world leader in accelerated computing, pioneering solutions in AI and digital twins.
$148,000 - $230,000
Distributed Systems
Senior Software Engineer
Remote
6+ years of experience
AI · Enterprise SaaS · Cloud

Description For Senior System Software Engineer, Distributed Systems - DGX Cloud

NVIDIA, the world leader in accelerated computing, is seeking a Senior System Software Engineer to join their DGX Cloud team. This role combines cutting-edge distributed systems work with AI infrastructure development, offering a unique opportunity to shape the future of cloud computing.

The position involves designing and developing automated GPU asset provisioning systems across cloud providers, working with datacenter firmware, and ensuring seamless integration from hardware to AI applications. You'll be at the forefront of creating reliable, scalable solutions that power NVIDIA's AI infrastructure.

The ideal candidate brings 6+ years of Python development experience, expert-level knowledge of systems programming, and a deep understanding of distributed systems. You'll work with industry-standard technologies like SPI, I2C, PCIe, and UEFI, while collaborating with cross-functional teams across hardware, software, and infrastructure domains.

This is an exceptional opportunity for someone passionate about distributed systems and AI infrastructure. You'll be working with some of the most forward-thinking professionals in the technology industry, tackling challenging problems that directly impact the advancement of AI applications. The role offers competitive compensation ($148,000-$230,000) plus equity, and provides the flexibility of remote work options.

Join NVIDIA's team of creative, autonomous professionals who are transforming the world's largest industries through accelerated computing and AI innovation. Your contributions will be crucial in scaling up NVIDIA's AI infrastructure and delivering fault-resilient solutions at scale.

Last updated 3 minutes ago

Responsibilities For Senior System Software Engineer, Distributed Systems - DGX Cloud

  • Design and architect a platform that automates GPU asset provisioning across cloud providers
  • Develop and optimize solutions for Datacenter firmware throughout lifecycle
  • Work with hardware, software, and infrastructure teams
  • Define server-level reliability and availability requirements
  • Drive failure analysis and large scale solution deployment
  • Ensure software integration from hardware to AI training applications

Requirements For Senior System Software Engineer, Distributed Systems - DGX Cloud

Python
Go
Linux
  • BS, MS, or PhD in EE/CS or related field
  • 6+ years of experience with Python development on Linux
  • Strong communication skills and ability to work with cross-functional teams
  • Knowledge of industry standards (SPI, I2C, PCIe, UEFI, PLDM)
  • Expert level knowledge of systems programming (Go, Python)
  • Understanding of distributed systems, data synchronization, and fault tolerance
  • Experience with system firmware/software and platform management

Benefits For Senior System Software Engineer, Distributed Systems - DGX Cloud

Equity
  • Equity

Interested in this job?

Jobs Related To NVIDIA Senior System Software Engineer, Distributed Systems - DGX Cloud

Senior Interconnect Product Engineer

Senior Interconnect Product Engineer role at NVIDIA focusing on high-speed networking solutions, requiring 5+ years of experience in network debugging and product engineering.

Senior Distributed Storage Engineer

Senior Distributed Storage Engineer role at NVIDIA focusing on building scalable storage solutions for AI/ML applications with competitive compensation and benefits.

Systems Engineer, Enterprise

Senior Systems Engineer role at NVIDIA focusing on enterprise HPC server deployments, requiring 6+ years experience in system engineering and Linux expertise.

Senior Distributed Systems Engineer, AI Infrastructure

Senior Distributed Systems Engineer role at NVIDIA, focusing on building exa-scale AI infrastructure for autonomous vehicles and deep learning platforms.

Senior Distributed Acceleration Engineer, RAPIDS

Senior Distributed Systems Engineer role at NVIDIA, focusing on GPU-accelerated data science and analytics pipelines, offering competitive compensation and remote work options.