Site Reliability Engineer, AI Infrastructure

Tesla is an automotive and technology company pioneering electric vehicles and AI infrastructure development.
$120,000 - $300,000
Site Reliability
Senior Software Engineer
In-Person
3+ years of experience
AI · Automotive · Robotics

Description For Site Reliability Engineer, AI Infrastructure

Tesla's Supercomputing/AI infrastructure team is at the forefront of developing and maintaining critical infrastructure for machine learning operations, supporting crucial projects like Autopilot, Tesla Bot, and the Dojo supercomputer. As a Site Reliability Engineer, you'll be instrumental in managing and optimizing the AI infrastructure that powers Tesla's ambitious autonomous driving and robotics initiatives.

The role combines high-performance computing expertise with site reliability engineering, requiring skills in Python, Golang, and Linux systems. You'll be working with cutting-edge technology, including GPU clusters and the Dojo platform, while ensuring the reliability and efficiency of systems that enable neural network training at scale.

This position offers an exceptional opportunity to impact the future of autonomous driving and robotics technology. You'll be part of a team that's pushing the boundaries of what's possible in AI infrastructure, working on projects that directly contribute to Tesla's mission of accelerating the world's transition to sustainable energy and autonomous systems.

The compensation package is highly competitive, ranging from $120,000 to $300,000 annually, plus additional cash and stock awards. Tesla offers comprehensive benefits including medical, dental, and vision coverage, 401(k) matching, and various family-friendly benefits. The role is based in the San Francisco Bay Area, putting you at the heart of Tesla's innovation hub.

This is an ideal position for a seasoned engineer who wants to work on challenging technical problems at scale, with direct impact on Tesla's most ambitious projects in autonomous driving and robotics. The role requires both technical expertise and operational excellence, offering significant growth opportunities in the rapidly evolving field of AI infrastructure.

Last updated 4 days ago

Responsibilities For Site Reliability Engineer, AI Infrastructure

  • Support AI/ML cluster infrastructure on GPU and Dojo platforms
  • Improve monitoring & self-healing pipelines and security posture
  • Optimize server, storage and network performance
  • Develop new tools in Python, Golang or Bash/Shell
  • Use Infrastructure as Code best practices
  • Participate in 24x7 on-call rotation

Requirements For Site Reliability Engineer, AI Infrastructure

Python
Go
Linux
Kubernetes
  • Proficiency in Python, Golang and/or Bash
  • Proficiency with Linux fundamentals and performance optimizations
  • Experience with configuration management software (Ansible, etc.)
  • Experience with containerization technologies such as Kubernetes
  • Bachelor's Degree in Computer Science, Computer Engineering, Electrical Engineering, Physics or proof of exceptional skills
  • 3+ years of additional equivalent experience

Benefits For Site Reliability Engineer, AI Infrastructure

401k
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Assistance
Parental Leave
Commuter Benefits
  • Aetna PPO and HSA plans with $0 payroll deduction
  • Family-building, fertility, adoption and surrogacy benefits
  • Dental and vision plans with $0 paycheck contribution
  • Company Paid HSA Contribution
  • Healthcare and Dependent Care Flexible Spending Accounts
  • LGBTQ+ care concierge services
  • 401(k) with employer match
  • Employee Stock Purchase Plans
  • Company paid Basic Life, AD&D, short-term and long-term disability insurance
  • Employee Assistance Program
  • Sick and Vacation time
  • Back-up childcare and parenting support
  • Commuter benefits
  • Employee discounts and perks program

Interested in this job?

Jobs Related To Tesla Site Reliability Engineer, AI Infrastructure

Sr. Site Reliability Engineer, Energy

Senior Site Reliability Engineer position at Tesla, focusing on scaling and improving infrastructure for Energy IoT applications and products.

Sr. Site Reliability Engineer, Energy

Senior SRE position at Tesla working on energy products infrastructure, offering competitive pay and comprehensive benefits.

Sr. Site Reliability Engineer, Vehicle Software

Senior SRE position at Tesla leading vehicle software simulation infrastructure, offering $140-300K salary with comprehensive benefits.

Hardware Site Reliability Engineer - Apple Vision Pro

Senior Hardware Site Reliability Engineer role at Apple, focusing on Vision Pro platform, requiring 3+ years SRE experience and strong Linux expertise.

Site Reliability Engineer L4/L5 - Live Cloud Platform SRE

Senior Site Reliability Engineer position at Netflix focusing on cloud platform reliability for live streaming events, offering competitive compensation and comprehensive benefits.