HPC Engineer, AI Infrastructure

Tesla is an automotive and technology company developing electric vehicles, AI systems, and robotics solutions.
$133,440 - $355,920
Cloud
Senior Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Automotive · Robotics

Description For HPC Engineer, AI Infrastructure

Tesla's Supercomputing/AI infrastructure team is at the forefront of developing and maintaining high-performance computing systems that power crucial initiatives like Full-Self-Driving (FSD), Tesla Bot, and Dojo supercomputer. As an HPC Engineer, you'll be instrumental in managing and optimizing the AI infrastructure that enables neural network training at scale. The role combines expertise in Linux systems, GPU computing, and infrastructure automation to support Tesla's ambitious AI and robotics projects.

The position offers a unique opportunity to work with cutting-edge technology in autonomous driving and robotics, while managing hundreds of servers and GPU clusters. You'll be responsible for maintaining and improving the platform that enables Tesla's engineering teams to push the boundaries of AI and machine learning. The role requires both technical depth in HPC systems and the ability to collaborate across teams to ensure smooth operations.

Working at Tesla means joining a team that's revolutionizing multiple industries simultaneously. You'll receive comprehensive benefits including competitive salary, equity opportunities, and excellent healthcare coverage. The role offers significant growth potential as Tesla continues to expand its AI infrastructure and computational capabilities. If you're passionate about high-performance computing and want to contribute to transformative technologies in autonomous driving and robotics, this role presents an exceptional opportunity to make a meaningful impact.

Last updated 16 days ago

Responsibilities For HPC Engineer, AI Infrastructure

  • Support AI/ML cluster infrastructure on GPU and Dojo platforms
  • Improve monitoring & self-healing pipelines and security posture
  • Work with hardware and storage vendors to optimize server, storage and network performance
  • Performance tuning & OS provisioning on Linux systems
  • Manage HPC clusters, workloads and applications
  • Automation and systems engineering
  • Participate in 24x7 on-call rotation

Requirements For HPC Engineer, AI Infrastructure

Python
Linux
  • Proficiency with scripting languages such as Python or Bash
  • Proficiency with Linux & network fundamentals
  • Experience with configuration management software, systems monitoring & alerting is a plus
  • Experience with high-throughput low-latency networks, GPU-based computing systems preferred
  • Experience with Slurm, LSF and storage management of parallel file systems is a plus
  • Bachelor's Degree in Computer Science, Computer Engineering, Electrical Engineering, Physics or proof of exceptional skills
  • 3+ years of additional equivalent experience or evidence of exceptional ability

Benefits For HPC Engineer, AI Infrastructure

Medical Insurance
Dental Insurance
Vision Insurance
401k
Parental Leave
Commuter Benefits
  • Medical insurance with $0 payroll deduction options
  • Family-building, fertility, adoption and surrogacy benefits
  • Dental and vision plans with $0 paycheck contribution options
  • Company Paid HSA Contribution
  • Healthcare and Dependent Care FSA
  • 401(k) with employer match
  • Employee Stock Purchase Plans
  • Company paid Basic Life, AD&D, disability insurance
  • Employee Assistance Program
  • Sick and Vacation time
  • Back-up childcare and parenting support
  • Commuter benefits
  • Employee discounts and perks program

Interested in this job?

Jobs Related To Tesla HPC Engineer, AI Infrastructure

Software QA Engineer, Cloud Infrastructure

Senior QA Engineer role at Tesla focusing on cloud infrastructure testing, automation, and quality assurance for critical systems.

Sr. Network Engineer, Data Center Engineering

Senior Network Engineer position at Tesla, leading data center network design and implementation for autonomous driving and AI systems.

Senior Software Engineer, Network Infrastructure

Senior Software Engineer position at Airbnb focusing on cloud native network infrastructure, offering remote work and competitive compensation.

Senior Cloud Solution Engineer

Senior Cloud Solution Engineer position at Oracle in Mexico City, focusing on cloud migration and implementation projects with 5+ years of experience required.

Senior Software Engineer, Cloud Infrastructure - Identity & Security

Senior Cloud Infrastructure Engineer role at Airbnb focusing on Identity & Security, building secure and scalable cloud native solutions.