Distributed Training Engineer, Sora

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity.
$295,000 - $440,000
Machine Learning
Staff Software Engineer
Hybrid
1,000 - 5,000 Employees
7+ years of experience

Description For Distributed Training Engineer, Sora

The Sora team at OpenAI is working on making video a key capability of OpenAI's foundation models. As a Distributed Training Engineer for Sora, you will work on improving the training throughput for our internal training framework and enable researchers to experiment with new ideas. This role requires strong engineering skills, the ability to write bug-free machine learning code, and deep knowledge of supercomputer performance.

Key responsibilities include:

  • Collaborating with researchers to develop systems-efficient video models and architectures
  • Applying the latest techniques to achieve impressive hardware efficiency for training runs
  • Profiling and optimizing the training framework

The ideal candidate should have experience with multi-modal ML pipelines, strong software engineering skills (particularly in Python), experience with understanding and optimizing training kernels, and a passion for understanding stable training dynamics.

OpenAI offers a competitive compensation package, including a salary range of $295K – $440K, generous equity, and comprehensive benefits such as medical insurance, mental health support, 401(k) matching, unlimited time off, and paid parental leave.

This role is based in San Francisco, CA, with a hybrid work model of 3 days in the office per week. OpenAI is committed to diversity, equality, and creating an inclusive environment for all employees.

Last updated 4 months ago

Responsibilities For Distributed Training Engineer, Sora

  • Collaborate with researchers to enable them to develop systems-efficient video models and architectures
  • Apply the latest techniques to our internal training framework to achieve impressive hardware efficiency for our training runs
  • Profile and optimize our training framework

Requirements For Distributed Training Engineer, Sora

Python
  • Experience working with multi-modal ML pipelines
  • Strong software engineering skills and proficiency in Python
  • Experience understanding and optimizing training kernels
  • Passion for understanding stable training dynamics
  • Ability to dive deep into systems implementations to improve performance and maintainability

Benefits For Distributed Training Engineer, Sora

Equity
Medical Insurance
Dental Insurance
Vision Insurance
401k
Education Budget
Parental Leave
Mental Health Assistance
  • Medical, dental, and vision insurance for you and your family
  • Mental health and wellness support
  • 401(k) plan with 50% matching
  • Unlimited time off and 13 company holidays per year
  • Paid parental leave (20 weeks) and family-planning support
  • Annual learning & development stipend ($1,500 per year)
  • Equity

Interested in this job?

Jobs Related To OpenAI Distributed Training Engineer, Sora

AIML - Manager, Engineering Program Management - ML Lifecycle

Lead ML Lifecycle Platform development at Apple as Engineering Program Management Manager, overseeing end-to-end machine learning operations and cross-functional teams.

Senior Product Manager - Copilot Measurement & Evaluation

Senior Product Manager role at Microsoft focusing on Copilot's measurement and evaluation, combining AI expertise with product management.

ML Engineer L5 - Ads Platform Engineering (Forecasting)

Senior ML Engineer position at Netflix focusing on ads platform engineering and forecasting, building scalable ad tech solutions.

Engineering Manager II - Map Search

Lead Uber's Location Search & Semantics team in building ML systems for map search across Uber apps, managing technical teams and driving innovation in search technology.

Machine Learning Platform Engineer

Senior ML Platform Engineer role at DoorDash, building scalable machine learning infrastructure for delivery logistics optimization.