Taro Logo

Machine Learning Performance Engineer

Wayve is the leading developer of Embodied AI technology, creating advanced AI software and foundation models for autonomous vehicles.
Machine Learning
Senior Software Engineer
Hybrid
5+ years of experience
AI · Automotive
This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Machine Learning Performance Engineer

Wayve, founded in 2017, is at the forefront of Embodied AI technology for autonomous vehicles. We're seeking a Machine Learning Performance Engineer to join our Machine Learning Platform team, focusing on optimizing large-scale training jobs as we scale our models.

Key responsibilities include:

  • Maximizing the MFU of large-scale training jobs
  • Profiling and identifying bottlenecks in training code
  • Implementing GPU kernels to improve training throughput
  • Collaborating with Research teams to integrate and test efficiency improvements
  • Managing and enhancing our GPU training clusters

The ideal candidate will have:

  • 5+ years of experience in performance optimization or ML engineering
  • Expertise in optimizing large-scale training jobs on GPU compute clusters
  • Experience working in platform teams and with research teams
  • Proficiency in benchmarking and reporting performance metrics
  • Strong Python coding skills
  • BS or MS in Machine Learning, Computer Science, Engineering, or related field

Desirable skills include experience with concurrent and distributed computing, Nvidia NSight Systems, GPU kernel implementation, and a deep understanding of computing fundamentals.

At Wayve, we value diversity and inclusivity. We offer a hybrid working model, combining office time for innovation and collaboration with the flexibility of working from home. Join us in creating autonomy that propels the world forward!

Last updated 10 months ago

Responsibilities For Machine Learning Performance Engineer

  • Maximising the MFU of our large scale training jobs
  • Profiling and identifying bottlenecks in training code
  • Implementing GPU kernels to improve training throughput
  • Working closely with Research teams to integrate and test training efficiency improvements
  • Owning and improving our GPU training clusters

Requirements For Machine Learning Performance Engineer

Python
Linux
  • 5+ years experience in performance optimization or ML engineering
  • Experience optimize large scale training jobs on GPU compute clusters
  • Experience in working in platform teams and working with research teams
  • Experience in reporting and tracking over time benchmarked performance in an open and accessible way
  • Ability to write high quality, well-structured and tested Python code
  • BS or MS in Machine Learning, Computer Science, Engineering, or a related technical discipline or equivalent experience

Interested in this job?