xAI is seeking a DevOps Engineer specializing in Supercomputing to join their team in the Bay Area. This role involves operating some of the world's largest GPU supercomputing clusters for AI training and serving production models. The ideal candidate will have experience with Kubernetes, Pulumi, Rust, Go, and Flux/ArgoCD.
The company operates with a flat organizational structure, encouraging all employees to be hands-on and contribute directly to the mission. Strong communication skills are essential, as is the ability to work across multiple areas of the company.
Key responsibilities include implementing Infrastructure as Code best practices, enhancing deployment pipelines, ensuring robust and secure service delivery, working with both on-premise clusters and cloud providers, and helping with security best practices for internal researchers and live external traffic.
Ideal experiences include writing scalable and highly available containerized applications in Rust, and managing compute fleets with tools like Pulumi, Terraform, or Ansible.
The interview process consists of an initial interview followed by four technical interviews, including coding assessment, systems design, hands-on problem-solving, and a project deep-dive presentation.
xAI offers a competitive salary range of $180,000 - $370,000 USD annually. The company values engineering excellence, curiosity, and a strong work ethic. This is an excellent opportunity for a skilled DevOps engineer looking to work on cutting-edge AI systems and contribute to understanding the universe.