Taro Logo

Software Engineer - Data Infrastructure (Pretraining Data)

xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.
$180,000 - $440,000
Data
Senior Software Engineer
In-Person
11 - 50 Employees
5+ years of experience
AI
This job posting is no longer active. 😔

Job Description

xAI, an innovative AI company focused on creating systems to understand the universe, is seeking a Software Engineer specializing in Data Infrastructure for their Pretraining Data team. This role, based in the Bay Area (San Francisco and Palo Alto), offers an opportunity to work on cutting-edge AI systems at a petabyte scale. The position involves building high-throughput data processing systems, managing large cloud compute clusters, and pre-processing datasets for AI training.

The ideal candidate will join a small, highly motivated team operating with a flat organizational structure, where leadership is earned through initiative and excellence. The role requires strong engineering skills, experience with multiple data modalities, and expertise in building distributed systems. You'll be working with technologies like Python, JAX, Rust, and Spark to create sophisticated data processing solutions.

The compensation package is highly competitive, ranging from $180,000 to $440,000 USD, complemented by equity and comprehensive benefits including medical, dental, and vision coverage, 401(k), and various insurance options. The company values hands-on contributors who can communicate effectively and thrive in a culture of curiosity and engineering excellence.

The interview process is thorough and technical, including coding assessments, systems hands-on evaluation, and project presentations, ensuring that new team members align with xAI's high standards and mission-driven approach. This is an exceptional opportunity for engineers who want to contribute directly to advancing AI technology while working alongside passionate professionals in a cutting-edge environment.

Last updated a month ago

Responsibilities For Software Engineer - Data Infrastructure (Pretraining Data)

  • Building petabyte-scale, high-throughput data processing systems
  • Managing workloads across large cloud compute clusters
  • Pre-processing datasets for AI training

Requirements For Software Engineer - Data Infrastructure (Pretraining Data)

Python
  • Strong engineering skills with passion to improve different aspects of data and model
  • Has worked on one or more modalities other than text and demonstrated exceptional work
  • Building bespoke data processing libraries from scratch
  • Designing and implementing distributed systems in Rust
  • Keeping up with state-of-the-art techniques for preparing AI training data
  • Organizing and meticulously bookkeeping data across multiple clouds, of multiple modalities, and from many sources

Benefits For Software Engineer - Data Infrastructure (Pretraining Data)

Medical Insurance
Vision Insurance
Dental Insurance
401k
Equity
  • Comprehensive medical coverage
  • Vision coverage
  • Dental coverage
  • 401(k) retirement plan
  • Equity
  • Short & long-term disability insurance
  • Life insurance