Sr. Site Reliability Engineer, Dojo

Tesla is a leading electric vehicle and clean energy company pioneering sustainable transportation and energy solutions.
$120,000 - $228,000
Site Reliability
Senior Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI

Description For Sr. Site Reliability Engineer, Dojo

Tesla is seeking a Senior Site Reliability Engineer to join their Dojo cluster infrastructure team. This role combines technical expertise with customer support, focusing on maintaining and optimizing critical infrastructure systems. The position offers a competitive salary range of $120,000 - $228,000 plus additional benefits.

The ideal candidate will be responsible for ensuring the reliability and performance of Tesla's Dojo cluster infrastructure, working with various teams and vendors to maintain seamless operations. Key responsibilities include managing customer support, troubleshooting complex systems, and implementing automation solutions for improved efficiency.

This role requires a minimum of 3 years of experience in SRE or infrastructure engineering, with strong Linux and networking knowledge. The position demands excellent problem-solving abilities and experience with automation tools like Python and Ansible. The successful candidate will work from Tesla's Palo Alto location, collaborating with multiple teams to maintain and improve system reliability.

Tesla offers an impressive benefits package including comprehensive healthcare, 401(k) matching, stock purchase options, and various family-friendly benefits. This is an excellent opportunity for an experienced SRE to join a leading technology company at the forefront of AI and sustainable energy innovation, working on cutting-edge infrastructure that powers Tesla's advanced AI training systems.

Last updated a month ago

Responsibilities For Sr. Site Reliability Engineer, Dojo

  • Respond to customer inquiries and resolve issues in a timely manner
  • Manage and prioritize change requests for cluster operations
  • Collaborate with third-party storage vendors to resolve issues and outages
  • Troubleshoot and debug storage-related problems
  • Work with network vendors to debug and resolve issues
  • Create visibility into network issues and implement monitoring tools
  • Collaborate with facility and operations teams for maintenance and upgrades
  • Ensure seamless communication during planned and unplanned outages
  • Troubleshoot and debug hardware issues through automation

Requirements For Sr. Site Reliability Engineer, Dojo

Python
Linux
  • 3+ years of experience in SRE or infrastructure engineering role
  • Strong understanding of Linux, networking, and storage systems
  • Excellent problem-solving and troubleshooting skills
  • Experience with automation tools like Ansible, Python
  • Strong communication and collaboration skills
  • Ability to work in a fast-paced environment
  • Familiarity with monitoring tools like Prometheus, Grafana, or ELK preferred
  • Experience with cloud-based infrastructure preferred

Benefits For Sr. Site Reliability Engineer, Dojo

Medical Insurance
Dental Insurance
Vision Insurance
401k
Mental Health Assistance
Parental Leave
Commuter Benefits
  • Medical plans with $0 payroll deduction
  • Family-building, fertility, adoption and surrogacy benefits
  • Dental and vision plans
  • Company Paid HSA Contribution
  • Healthcare and Dependent Care FSA
  • 401(k) with employer match
  • Employee Stock Purchase Plans
  • Company paid Basic Life, AD&D, disability insurance
  • Employee Assistance Program
  • Sick and Vacation time
  • Back-up childcare
  • Commuter benefits
  • Employee discounts

Interested in this job?

Jobs Related To Tesla Sr. Site Reliability Engineer, Dojo

Sr. Site Reliability Engineer, Simulation Cluster Infrastructure

Senior Site Reliability Engineer position at Tesla, focusing on simulation cluster infrastructure and large-scale software systems for electric vehicle development.

Site Reliability Engineer, Observability, Infrastructure

Senior Site Reliability Engineer position at Tesla focusing on observability and infrastructure management for global applications and manufacturing systems.

Sr. Site Reliability Engineer, VMware, Infrastructure

Senior Site Reliability Engineer position at Tesla, focusing on VMware and Windows infrastructure management with emphasis on automation and system reliability.

Sr. Site Reliability Engineer, Integration Tools

Senior Site Reliability Engineer position at Tesla, focusing on integration tools and platforms for vehicle software systems.

Sr. Site Reliability Engineer, Energy

Senior Site Reliability Engineer position at Tesla, focusing on scaling and maintaining energy IoT infrastructure using Kubernetes, AWS, and modern tech stack.