HPC Engineer, AI Infrastructure

Tesla is an automotive and technology company developing electric vehicles, AI systems, and robotics solutions.
$120,000 - $300,000
Cloud
Senior Software Engineer
In-Person
3+ years of experience
AI · Automotive · Robotics

Description For HPC Engineer, AI Infrastructure

Tesla's Supercomputing/AI infrastructure team is at the forefront of developing and maintaining high-performance computing systems that power crucial initiatives like Full-Self-Driving (FSD), Tesla Bot, and Dojo supercomputer. As an HPC Engineer, you'll be instrumental in managing and optimizing the AI infrastructure that enables neural network training at scale. The role combines expertise in Linux systems, GPU computing, and infrastructure automation to support Tesla's ambitious goals in autonomous driving and robotics.

The position offers a unique opportunity to work with cutting-edge technology and directly impact the development of revolutionary products. You'll be responsible for maintaining and improving the platform that supports Tesla's engineering teams, ensuring they have the necessary computing resources and tools to push the boundaries of AI and automation. The role involves sophisticated systems automation, configuration management, and performance optimization of complex computing clusters.

Working at Tesla means joining a team that's reshaping multiple industries through innovation. You'll collaborate with world-class engineers and scientists while having access to some of the most advanced computing infrastructure in the world. The company offers comprehensive benefits, competitive compensation including stock awards, and the chance to contribute to technologies that are transforming transportation and robotics. This role is perfect for someone who is passionate about high-performance computing, has strong Linux and automation skills, and wants to be part of Tesla's mission to accelerate the world's transition to sustainable energy.

Last updated a day ago

Responsibilities For HPC Engineer, AI Infrastructure

  • Support AI/ML cluster infrastructure on GPU and Dojo platforms
  • Improve monitoring & self-healing pipelines and security posture
  • Work with hardware and storage vendors to optimize server, storage and network performance
  • Performance tuning & OS provisioning on Linux systems
  • Manage HPC clusters, workloads and applications
  • Automation and systems engineering
  • Participate in 24x7 on-call rotation

Requirements For HPC Engineer, AI Infrastructure

Python
Linux
  • Proficiency with scripting languages such as Python or Bash
  • Proficiency with Linux & network fundamentals
  • Experience with configuration management software, systems monitoring & alerting is a plus
  • Experience with high-throughput low-latency networks, GPU-based computing systems preferred
  • Experience with Slurm, LSF and storage management of parallel file systems is a plus
  • Bachelor's Degree in Computer Science, Computer Engineering, Electrical Engineering, Physics or proof of exceptional skills
  • 3+ years of additional equivalent experience or evidence of exceptional ability

Benefits For HPC Engineer, AI Infrastructure

Medical Insurance
Dental Insurance
Vision Insurance
401k
Commuter Benefits
Parental Leave
  • Aetna PPO and HSA plans with $0 payroll deduction
  • Family-building, fertility, adoption and surrogacy benefits
  • Dental and vision plans with $0 paycheck contribution
  • Company Paid HSA Contribution
  • Healthcare and Dependent Care Flexible Spending Accounts
  • 401(k) with employer match
  • Employee Stock Purchase Plans
  • Company paid Basic Life, AD&D, short-term and long-term disability insurance
  • Employee Assistance Program
  • Sick and Vacation time
  • Back-up childcare and parenting support resources
  • Commuter benefits
  • Employee discounts and perks program

Interested in this job?

Jobs Related To Tesla HPC Engineer, AI Infrastructure

Software QA Engineer, Cloud Infrastructure

Senior QA Engineer role at Tesla focusing on cloud infrastructure testing, automation, and quality assurance for critical systems.

Sr. Network Engineer, Data Center Engineering

Senior Network Engineer position at Tesla, leading data center network design and implementation for autonomous driving and AI systems.

Dedicated Technical Support Engineer 3 - VCF Compute

Senior Technical Support Engineer role at Broadcom's VMware division, focusing on VCF Compute technologies with competitive salary and benefits.

Senior Network Design Engineer, Google Cloud

Senior Network Design Engineer position at Google Cloud, focusing on ASIC development and data center networking infrastructure.

Senior Formal Verification Engineer

Senior Formal Verification Engineer role at Google Cloud, focusing on ASIC design verification and cloud infrastructure development.