Site Reliability Engineer, AI Infrastructure

Tesla

Tesla is an automotive and technology company leading in electric vehicles and AI development.

San Francisco, CA, USA

$133,440 - $355,920

Site Reliability

Senior Software Engineer

In-Person

5,000+ Employees

3+ years of experience

AI · Automotive · Robotics

Description For Site Reliability Engineer, AI Infrastructure

Tesla's Supercomputing/AI infrastructure team is at the forefront of developing and maintaining critical infrastructure for machine learning operations, supporting crucial projects like Full-Self-Driving (FSD), Tesla Bot, and Dojo supercomputer. As a Site Reliability Engineer, you'll be instrumental in maintaining and enhancing the platform that powers Tesla's AI initiatives. The role combines high-performance computing expertise with infrastructure automation, focusing on GPU and Dojo platforms.

The position offers an exciting opportunity to work with cutting-edge technology in autonomous driving and robotics. You'll be responsible for managing AI infrastructure, optimizing performance, and ensuring the reliability of systems that enable neural network training at scale. The role requires strong technical skills in Python, Golang, and Linux systems, along with experience in modern DevOps practices and tools.

Tesla offers a comprehensive benefits package including competitive salary, equity opportunities, and extensive health coverage. The company's mission to accelerate the world's transition to sustainable energy makes this an impactful role where your work will directly contribute to advancing autonomous driving technology and robotics development.

Working at Tesla means joining a team of innovative professionals pushing the boundaries of technology in automotive and AI fields. The role provides opportunities for growth and learning while working with some of the most advanced computing systems in the industry. If you're passionate about infrastructure automation, system reliability, and want to be part of revolutionizing transportation and robotics, this position offers an ideal opportunity to make a significant impact.

Last updated 2 months ago

Responsibilities For Site Reliability Engineer, AI Infrastructure

Support the AI/ML cluster infrastructure on both GPU and Dojo platforms
Improve monitoring & self-healing pipelines and security posture
Optimize server, storage and network performance
Develop new tools in Python, Golang or Bash/Shell
Use Infrastructure as Code best practices
Participate in 24x7 on-call rotation

Requirements For Site Reliability Engineer, AI Infrastructure

Python

Linux

Kubernetes

Proficiency in Python, Golang and/or Bash
Proficiency with Linux fundamentals and performance optimizations
Experience with configuration management software (Ansible, etc.)
Experience with containerization technologies such as Kubernetes
Bachelor's Degree in Computer Science, Computer Engineering, Electrical Engineering, Physics or proof of exceptional skills
3+ years of additional equivalent experience or evidence of exceptional ability

Benefits For Site Reliability Engineer, AI Infrastructure

Medical Insurance

Dental Insurance

Vision Insurance

401k

Mental Health Assistance

Parental Leave

Commuter Benefits

Aetna PPO and HSA plans with $0 payroll deduction
Family-building, fertility, adoption and surrogacy benefits
Dental and vision plans with $0 paycheck contribution
Company Paid HSA Contribution
Healthcare and Dependent Care Flexible Spending Accounts
401(k) with employer match
Employee Stock Purchase Plans
Company paid Basic Life, AD&D, short-term and long-term disability insurance
Employee Assistance Program
Sick and Vacation time
Back-up childcare and parenting support resources
Commuter benefits
Employee discounts and perks program

Tesla

Tesla is an automotive and technology company leading in electric vehicles and AI development.

San Francisco, CA, USA

$133,440 - $355,920

Site Reliability

Senior Software Engineer

In-Person

5,000+ Employees

3+ years of experience

AI · Automotive · Robotics

Interested in this job?

Jobs Related To Tesla Site Reliability Engineer, AI Infrastructure

Sr. Site Reliability Engineer, Simulation Cluster Infrastructure

Tesla

Senior Site Reliability Engineer position at Tesla, focusing on simulation cluster infrastructure and large-scale software systems for electric vehicle development.

Site Reliability Engineer, Observability, Infrastructure

Tesla

Senior Site Reliability Engineer position at Tesla focusing on observability and infrastructure management for global applications and manufacturing systems.

Sr. Site Reliability Engineer, VMware, Infrastructure

Tesla

Senior Site Reliability Engineer position at Tesla, focusing on VMware and Windows infrastructure management with emphasis on automation and system reliability.

Sr. Site Reliability Engineer, Energy

Tesla

Senior Site Reliability Engineer position at Tesla, focusing on scaling and maintaining energy IoT infrastructure using Kubernetes, AWS, and modern tech stack.

Sr. Site Reliability Engineer, Energy

Tesla

Senior Site Reliability Engineer position at Tesla, focusing on energy IoT infrastructure and systems scaling with competitive compensation and comprehensive benefits.