Sr. Site Reliability Engineer, Simulation Cluster Infrastructure

Tesla is a leading electric vehicle manufacturer revolutionizing sustainable transportation and energy solutions.
$140,000 - $300,000
Site Reliability
Senior Software Engineer
In-Person
5,000+ Employees
3+ years of experience
Automotive

Description For Sr. Site Reliability Engineer, Simulation Cluster Infrastructure

Tesla's Simulations Team is seeking a Senior Site Reliability Engineer to lead their Simulation Cluster Infrastructure. This role is crucial in managing large-scale software infrastructure that simulates billions of miles of vehicle driving daily. The position offers an exciting opportunity to work on cutting-edge technology in the electric vehicle industry.

The role involves leading major initiatives such as implementing new generation Processor-in-the-Loop cluster infrastructure, managing Bazel/Buck Remote Execution clusters, and establishing distributed observability at scale. You'll be working with Kubernetes and Tesla's internal orchestration systems to build robust and reliable infrastructure.

As a Sr. SRE, you'll be joining a team of expert engineers dedicated to revolutionizing electric vehicle production through advanced software development. The position requires strong experience with Linux internals, Kubernetes, and proficiency in languages like Python, Rust, or Go. You'll be responsible for designing and implementing observability infrastructure, automating deployment processes, and ensuring scalable cloud-first architecture.

The compensation package is highly competitive, ranging from $140,000 to $300,000 annually, plus additional cash and stock awards. Tesla offers comprehensive benefits including medical, dental, and vision coverage, 401(k) matching, stock purchase plans, and various family-friendly benefits. This is an excellent opportunity for an experienced SRE to make a significant impact in the automotive industry while working with cutting-edge technology and infrastructure.

Last updated 2 months ago

Responsibilities For Sr. Site Reliability Engineer, Simulation Cluster Infrastructure

  • Lead the bring-up of major software infrastructure initiatives for Software-in-the-Loop simulations at scale
  • Lead the bring-up of Bazel Remote Execution and Content Addressable Storage platforms
  • Design and implement observability infrastructure across multiple distributed systems
  • Influence architectural decisions for scalable, cloud-first service architecture
  • Automate software release roll-out processes to Kubernetes

Requirements For Sr. Site Reliability Engineer, Simulation Cluster Infrastructure

Python
Rust
Go
Kubernetes
Linux
  • 3+ years of relevant technical experience
  • Experience with Remote Execution in build systems using Bazel
  • Experience with Linux internals focusing on performance profiling
  • Advanced experience with Kubernetes
  • Proficiency in Python, Rust, Go, and/or C++
  • Experience in data-driven capacity planning
  • Troubleshooting and full-cycle incident response capabilities

Benefits For Sr. Site Reliability Engineer, Simulation Cluster Infrastructure

401k
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Assistance
Parental Leave
Commuter Benefits
  • Medical plans with $0 payroll deduction
  • Family-building, fertility, adoption and surrogacy benefits
  • Dental and vision plans
  • HSA with company contribution
  • Healthcare and Dependent Care FSA
  • 401(k) with employer match
  • Employee Stock Purchase Plans
  • Life, AD&D, short-term and long-term disability insurance
  • Employee Assistance Program
  • Sick and Vacation time
  • Back-up childcare
  • Commuter benefits
  • Employee discounts

Interested in this job?

Jobs Related To Tesla Sr. Site Reliability Engineer, Simulation Cluster Infrastructure

Sr. Site Reliability Engineer, VMware, Infrastructure

Senior Site Reliability Engineer position at Tesla, focusing on VMware and Windows infrastructure management with emphasis on automation and system reliability.

Sr. Site Reliability Engineer, Energy

Senior Site Reliability Engineer position at Tesla, focusing on scaling and maintaining energy IoT infrastructure using Kubernetes, AWS, and modern tech stack.

Sr. Site Reliability Engineer, Energy

Senior Site Reliability Engineer position at Tesla, focusing on energy IoT infrastructure and systems scaling with competitive compensation and comprehensive benefits.

Site Reliability Engineer, AI Infrastructure

Senior Site Reliability Engineer position at Tesla, focusing on AI infrastructure maintenance and optimization for autonomous driving and robotics projects.

Sr. Site Reliability Engineer, Dojo

Senior Site Reliability Engineer position at Tesla, focusing on Dojo cluster infrastructure maintenance and optimization with competitive compensation and comprehensive benefits.