Tesla's Supercomputing/AI infrastructure team is at the forefront of developing and maintaining critical infrastructure for machine learning operations, supporting crucial projects like Full-Self-Driving (FSD), Tesla Bot, and Dojo supercomputer. As a Site Reliability Engineer, you'll be instrumental in maintaining and enhancing the platform that powers Tesla's AI initiatives. The role combines high-performance computing expertise with infrastructure automation, focusing on GPU and Dojo platforms.
The position offers an exciting opportunity to work with cutting-edge technology in autonomous driving and robotics. You'll be responsible for managing AI infrastructure, optimizing performance, and ensuring the reliability of systems that enable neural network training at scale. The role requires strong technical skills in Python, Golang, and Linux systems, along with experience in modern DevOps practices and tools.
Tesla offers a comprehensive benefits package including competitive salary, equity opportunities, and extensive health coverage. The company's mission to accelerate the world's transition to sustainable energy makes this an impactful role where your work will directly contribute to advancing autonomous driving technology and robotics development.
Working at Tesla means joining a team of innovative professionals pushing the boundaries of technology in automotive and AI fields. The role provides opportunities for growth and learning while working with some of the most advanced computing systems in the industry. If you're passionate about infrastructure automation, system reliability, and want to be part of revolutionizing transportation and robotics, this position offers an ideal opportunity to make a significant impact.