DevOps Engineer - Supercomputing

xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.
$180,000 - $370,000
DevOps
Senior Software Engineer
Hybrid
5+ years of experience

Description For DevOps Engineer - Supercomputing

xAI is seeking a DevOps Engineer specializing in Supercomputing to join their team in the Bay Area. This role involves operating some of the world's largest GPU supercomputing clusters for AI training and serving production models. The ideal candidate will have experience with Kubernetes, Pulumi, Rust, Go, and Flux/ArgoCD.

The company operates with a flat organizational structure, encouraging all employees to be hands-on and contribute directly to the mission. Strong communication skills are essential, as is the ability to work across multiple areas of the company.

Key responsibilities include implementing Infrastructure as Code best practices, enhancing deployment pipelines, ensuring robust and secure service delivery, working with both on-premise clusters and cloud providers, and helping with security best practices for internal researchers and live external traffic.

Ideal experiences include writing scalable and highly available containerized applications in Rust, and managing compute fleets with tools like Pulumi, Terraform, or Ansible.

The interview process consists of an initial interview followed by four technical interviews, including coding assessment, systems design, hands-on problem-solving, and a project deep-dive presentation.

xAI offers a competitive salary range of $180,000 - $370,000 USD annually. The company values engineering excellence, curiosity, and a strong work ethic. This is an excellent opportunity for a skilled DevOps engineer looking to work on cutting-edge AI systems and contribute to understanding the universe.

Last updated 3 months ago

Responsibilities For DevOps Engineer - Supercomputing

  • Operating some of the world's largest GPU supercomputing clusters for both AI training and serving production models
  • Implement IaC best practices, enhancing deployment pipelines, and ensuring robust, secure service delivery across our production environments
  • Working with both on-premise clusters and cloud providers
  • Help with security best practices for internal researchers and live external traffic

Requirements For DevOps Engineer - Supercomputing

Kubernetes
Go
Rust
  • Writing scalable and highly available containerized applications in Rust
  • Managing compute fleets with Pulumi, Terraform, Ansible, or other stateful automation libraries

Interested in this job?

Jobs Related To xAI DevOps Engineer - Supercomputing

Support Engineer IV, REALM

Senior DevOps Engineer role at Amazon supporting transportation technology systems, requiring 4+ years experience in software development or technical support.

Software Engineer (SRE Tools & Automation), IS&T Enterprise Systems

Senior DevOps/SRE Engineer position at Apple, leading a team of 10 engineers and managing production infrastructure for global customer support systems.

Software Engineer (SRE Tools & Automation), IS&T Enterprise Systems

Senior DevOps Engineer role at Apple leading production support and automation initiatives for global customer support systems, managing a team of 10 engineers.

Senior DevOps Engineer

Senior DevOps Engineer position at Apple in Austin, focusing on infrastructure management, deployment automation, and development tools support.

Senior Software Engineer - HPC Linux Environment

Senior Software Engineer position at Captivation Software, focusing on HPC Linux environments and DevOps, requiring TS/SCI clearance and offering competitive benefits.