Site Reliability Engineer (SRE)

xAI is an AI company working on large-scale, highly-reliable distributed systems and AI infrastructure.
Site Reliability
Senior Software Engineer
Hybrid

Description For Site Reliability Engineer (SRE)

xAI is seeking an experienced Site Reliability Engineer (SRE) to join their London team. This role focuses on improving observability, building dashboards and alerts, managing on-call rotations, and enhancing deployment processes. The ideal candidate should be an expert in languages like Rust, C++, or Go, and have deep knowledge of monitoring technologies, deployment tools, and Kubernetes. The position offers a dynamic startup environment, working on large-scale distributed systems, including the Grok production stack. Benefits include competitive compensation, equity, and health insurance. The role requires working from the London office, with occasional late meetings and business trips to California. Join xAI to tackle complex technical challenges and contribute to cutting-edge AI infrastructure.

Last updated 4 months ago

Responsibilities For Site Reliability Engineer (SRE)

  • Improving observability by adding/adjusting metrics
  • Building easily parsable dashboards
  • Building reliable alerts
  • Designing and overseeing on-call rotations
  • Improving deployment process to increase reliability

Requirements For Site Reliability Engineer (SRE)

Rust
Go
Kubernetes
  • Expert in at least one programming language that compiles to machine code such as Rust, C++, or Go (Rust or C++ preferred)
  • Expert knowledge of monitoring technologies such as Prometheus, Grafana, and PagerDuty
  • Expert knowledge of deployment technologies such as Pulumi or Terraform
  • Expert knowledge of Kubernetes

Benefits For Site Reliability Engineer (SRE)

Medical Insurance
Dental Insurance
Equity
  • Competitive cash-based compensation
  • xAI equity
  • Private health and dental insurance
  • Unlimited time off subject to prior approval

Interested in this job?

Jobs Related To xAI Site Reliability Engineer (SRE)

Senior Site Reliability Engineer

Senior Site Reliability Engineer role at Zscaler, focusing on cloud infrastructure, automation, and maintaining high-availability systems across AWS, Azure, and GCP.

Senior Site Reliability Engineer

Senior SRE position at Blacklane focusing on system reliability, observability, and mentoring, offering hybrid work and equity in a global mobility company.

Senior Site Reliability Engineer

Senior Site Reliability Engineer role at Zscaler, focusing on cloud infrastructure, automation, and maintaining high-availability systems across AWS, Azure, and GCP.

Senior Site Reliability Engineer

Senior Site Reliability Engineer role at Prove, focusing on building and maintaining scalable, reliable systems for digital identity solutions.

Site Reliability Engineer - EMEA

Remote Site Reliability Engineer position at BforeAI, focusing on system reliability and scalability across EMEA region.