Tesla is seeking a Senior Site Reliability Engineer to join their Machine Learning Operations and Infrastructure team. This role combines DevOps, MLOps, and cloud infrastructure expertise to support Tesla's engineering initiatives across AWS, Azure, and GCP platforms. The position focuses on maintaining and improving the ML platform, ensuring robust deployment processes, and optimizing infrastructure for AI workloads.
As an SRE, you'll be responsible for developing and automating deployment workflows, implementing monitoring systems, and creating self-healing processes. The role requires expertise in Kubernetes, machine learning operations, and modern DevOps practices. You'll work with cutting-edge technologies and frameworks while collaborating with cross-functional teams of data scientists and engineers.
The ideal candidate brings strong technical expertise in Python, Golang, and React, combined with deep knowledge of ML infrastructure and cloud platforms. You'll be instrumental in building and maintaining scalable solutions for ML model training, deployment, and monitoring. The position offers exposure to innovative projects in the automotive and AI domains, alongside Tesla's comprehensive benefits package.
This role presents an exciting opportunity to work at the intersection of site reliability engineering and machine learning, contributing to Tesla's mission of accelerating the world's transition to sustainable energy. You'll be part of a team that values technical excellence, innovation, and collaborative problem-solving, while enjoying competitive compensation and extensive benefits.