Senior Software Engineer, Bare Metal Automation - DGX Cloud

World leader in accelerated computing, pioneering AI and digital twins technology to transform industries.
$148,000 - $276,000
Cloud
Senior Software Engineer
Hybrid
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Senior Software Engineer, Bare Metal Automation - DGX Cloud

NVIDIA, the world leader in accelerated computing, is seeking a Senior Software Engineer for their Bare Metal Automation team within DGX Cloud. This role is crucial for scaling up NVIDIA's AI Infrastructure, focusing on managing and automating large-scale GPU clusters. The position combines hardware expertise with software engineering, requiring experience with bare metal hardware APIs and frameworks, particularly for GPU servers.

The role involves working with cutting-edge AI infrastructure, managing fleets of GPU nodes, and implementing sophisticated monitoring and health management systems. You'll be part of a team responsible for maintaining industry-leading reliability and performance of GPU clusters, working directly with NVIDIA's advanced computing technologies.

The ideal candidate brings 5+ years of experience in large-scale production systems, strong programming skills in languages like Go and Python, and a deep understanding of bare metal hardware automation. This position offers an opportunity to work at the forefront of AI computing, contributing to systems that power various AI workloads across industries.

NVIDIA offers a competitive compensation package, including a base salary range of $148,000-$276,000, equity, and comprehensive benefits. The company is known for its innovative culture and is consistently ranked as one of the technology world's most desirable employers. This role provides an excellent opportunity for those passionate about GPU hardware and AI infrastructure to make a significant impact in the field.

Last updated 10 minutes ago

Responsibilities For Senior Software Engineer, Bare Metal Automation - DGX Cloud

  • Work on DGX Cloud team managing production systems for large scalable GPU clusters
  • Implement monitoring and health management capabilities for GPU assets
  • Manage fleets of GPU nodes
  • Work with cross-functional teams to ensure production AI clusters run reliably
  • Evaluate system failures and improve services through incident management

Requirements For Senior Software Engineer, Bare Metal Automation - DGX Cloud

Go
Python
Linux
  • 5+ years experience in similar role with large-scale production systems
  • BS in Computer Science, Engineering, Physics, Mathematics or equivalent experience
  • Direct experience in software engineering with bare metal hardware APIs
  • Strong communication skills and ability to work with cross-functional teams
  • Proficiency in systems programming languages (Go, Python)
  • Solid understanding of data structures and algorithms
  • Experience with software engineering principles, tools and techniques

Benefits For Senior Software Engineer, Bare Metal Automation - DGX Cloud

Equity
  • Equity
  • Comprehensive benefits package

Interested in this job?

Jobs Related To NVIDIA Senior Software Engineer, Bare Metal Automation - DGX Cloud

Senior Platform Engineer

Senior Platform Engineer role at Deliveroo, leading cloud infrastructure and platform engineering initiatives in a hybrid work environment in London.

Senior Software Developer - Studio in the Cloud

Senior Cloud Software Engineer role at Oracle focusing on building cloud services for media production industry with competitive compensation and benefits.

Senior Software Engineer, Cloud Infrastructure

Senior Cloud Infrastructure Engineer role at Airbnb, building and maintaining cloud native infrastructure and distributed systems at global scale.

Sr Software Development Engineer- Cloud Networking

Senior Software Development Engineer position at Extreme Networks, focusing on cloud networking and orchestration using Java/Golang with 4-8 years experience required.

Senior Platform Engineer

Senior Platform Engineer position at Zencore, working remotely with Google Cloud technologies to help companies modernize their infrastructure and applications.