Site Reliability Engineer, AI Infrastructure

Tesla is an automotive and technology company pioneering electric vehicles and AI infrastructure development.
$133,440 - $355,920
Site Reliability
Senior Software Engineer
In-Person
3+ years of experience
AI · Automotive · Robotics

Description For Site Reliability Engineer, AI Infrastructure

Tesla's Supercomputing/AI infrastructure team is at the forefront of developing and maintaining critical infrastructure for machine learning operations, supporting crucial projects like Autopilot, Tesla Bot, and the Dojo supercomputer. As a Site Reliability Engineer, you'll be instrumental in maintaining and enhancing the platform that powers Tesla's Full-Self-Driving (FSD) and robotics initiatives. The role combines high-performance computing expertise with infrastructure automation, requiring skills in Python, Golang, and Linux systems. You'll work with cutting-edge technology, including GPU clusters and the Dojo platform, while ensuring optimal performance and reliability of Tesla's AI infrastructure. The position offers competitive compensation ranging from $133,440 to $355,920 annually, plus comprehensive benefits including medical, dental, and vision coverage, 401(k) matching, and stock purchase options. This is an opportunity to contribute to revolutionary projects in autonomous driving and robotics while working with one of the most innovative companies in the technology and automotive sectors. The role demands a strong background in systems engineering and offers significant growth potential in the rapidly evolving field of AI infrastructure.

Last updated 4 hours ago

Responsibilities For Site Reliability Engineer, AI Infrastructure

  • Support AI/ML cluster infrastructure on GPU and Dojo platforms
  • Improve monitoring & self-healing pipelines and security posture
  • Optimize server, storage and network performance
  • Develop new tools in Python, Golang or Bash/Shell
  • Use Infrastructure as Code best practices
  • Participate in 24x7 on-call rotation

Requirements For Site Reliability Engineer, AI Infrastructure

Python
Go
Linux
Kubernetes
  • Proficiency in Python, Golang and/or Bash
  • Proficiency with Linux fundamentals and performance optimizations
  • Experience with configuration management software (Ansible, etc.)
  • Experience with containerization technologies such as Kubernetes
  • Experience with systems monitoring & alerting (Prometheus, Grafana, Telegraf, Splunk, etc.)
  • Bachelor's Degree in Computer Science, Computer Engineering, Electrical Engineering, Physics or proof of exceptional skills
  • 3+ years of additional equivalent experience

Benefits For Site Reliability Engineer, AI Infrastructure

Medical Insurance
Dental Insurance
Vision Insurance
401k
Parental Leave
Mental Health Assistance
Commuter Benefits
  • Medical insurance with $0 payroll deduction options
  • Family-building, fertility, adoption and surrogacy benefits
  • Dental and vision plans with $0 paycheck contribution options
  • Company Paid HSA Contribution
  • Healthcare and Dependent Care FSA
  • 401(k) with employer match
  • Employee Stock Purchase Plans
  • Company paid Basic Life, AD&D, disability insurance
  • Employee Assistance Program
  • Sick and Vacation time
  • Back-up childcare
  • Commuter benefits
  • Employee discounts and perks program

Interested in this job?

Jobs Related To Tesla Site Reliability Engineer, AI Infrastructure

Sr. Site Reliability Engineer, Energy

Senior Site Reliability Engineer position at Tesla, focusing on energy IoT applications and infrastructure, offering competitive salary and comprehensive benefits.

Sr. Site Reliability Engineer, Simulation Cluster Infrastructure

Senior SRE position at Tesla leading simulation infrastructure, focusing on Kubernetes, distributed systems, and cloud architecture with competitive compensation.

Sr. Site Reliability Engineer, Dojo

Senior Site Reliability Engineer position at Tesla, focusing on Dojo cluster infrastructure maintenance and optimization with competitive compensation and benefits.

Sr. Site Reliability Engineer, Vehicle Software

Senior SRE position at Tesla leading simulation infrastructure initiatives for vehicle software, offering competitive compensation and comprehensive benefits.

Senior Software Developer, Site Reliability Engineering, Google Cloud

Senior SRE role at Google Cloud focusing on building and maintaining large-scale distributed systems with competitive compensation and growth opportunities.