Hardcore Engineer - Pretraining Infrastructure

xAI

xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.

San Francisco, CA, USA

$180,000 - $440,000

Backend

In-Person

This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Hardcore Engineer - Pretraining Infrastructure

xAI is seeking a Hardcore Engineer for Pretraining Infrastructure. The role involves designing, building, and implementing large-scale distributed training systems, profiling, debugging, and optimizing multi-host GPU utilization, hardware/software/algorithm co-design, maintaining and innovating on the codebase, and building tools to boost team productivity. The ideal candidate should have experience in configuring and troubleshooting operating systems for maximum performance, and building scalable training frameworks for AI models in HPC clusters. The team operates with a flat organizational structure, encouraging engineers to work across multiple areas and contribute directly to the company's mission. Strong communication skills and the ability to share knowledge concisely and accurately are essential. The interview process includes a coding assessment, systems hands-on demonstration, project deep-dive presentation, and a meet and greet with the wider team. xAI values engineering excellence, curiosity, and a strong work ethic.

Last updated 9 months ago

Responsibilities For Hardcore Engineer - Pretraining Infrastructure

Design, build, and implement large-scale distributed training systems
Profiling, debugging, and optimizing multi-host GPU utilization
Hardware / Software / Algorithm co-design
Maintain and innovate on the codebase
Build tools to boost the productivity of the team

Requirements For Hardcore Engineer - Pretraining Infrastructure

Python

Rust

Experience in configuring and troubleshooting operating systems for maximum performance
Built scalable training framework for AI models in HPC clusters
Experience with scalable orchestration framework and tools
Knowledge of machine learning compilers and runtime such as XLA, MLIR, and Triton
Experience with distributed training strategies such as FSDP, Megatron, and pipeline parallelism
Familiarity with NCCL or custom communication libraries for performant communication collectives
Strong communication skills

xAI

xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.

San Francisco, CA, USA

$180,000 - $440,000

Backend

In-Person