Distributed Training Engineer, Sora

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity.
$295,000 - $440,000
Machine Learning
Staff Software Engineer
Hybrid
1,000 - 5,000 Employees
7+ years of experience

Description For Distributed Training Engineer, Sora

The Sora team at OpenAI is working on making video a key capability of OpenAI's foundation models. As a Distributed Training Engineer for Sora, you will work on improving the training throughput for our internal training framework and enable researchers to experiment with new ideas. This role requires strong engineering skills, the ability to write bug-free machine learning code, and deep knowledge of supercomputer performance.

Key responsibilities include:

  • Collaborating with researchers to develop systems-efficient video models and architectures
  • Applying the latest techniques to achieve impressive hardware efficiency for training runs
  • Profiling and optimizing the training framework

The ideal candidate should have experience with multi-modal ML pipelines, strong software engineering skills (particularly in Python), experience with understanding and optimizing training kernels, and a passion for understanding stable training dynamics.

OpenAI offers a competitive compensation package, including a salary range of $295K – $440K, generous equity, and comprehensive benefits such as medical insurance, mental health support, 401(k) matching, unlimited time off, and paid parental leave.

This role is based in San Francisco, CA, with a hybrid work model of 3 days in the office per week. OpenAI is committed to diversity, equality, and creating an inclusive environment for all employees.

Last updated 6 months ago

Responsibilities For Distributed Training Engineer, Sora

  • Collaborate with researchers to enable them to develop systems-efficient video models and architectures
  • Apply the latest techniques to our internal training framework to achieve impressive hardware efficiency for our training runs
  • Profile and optimize our training framework

Requirements For Distributed Training Engineer, Sora

Python
  • Experience working with multi-modal ML pipelines
  • Strong software engineering skills and proficiency in Python
  • Experience understanding and optimizing training kernels
  • Passion for understanding stable training dynamics
  • Ability to dive deep into systems implementations to improve performance and maintainability

Benefits For Distributed Training Engineer, Sora

Equity
Medical Insurance
Dental Insurance
Vision Insurance
401k
Education Budget
Parental Leave
Mental Health Assistance
  • Medical, dental, and vision insurance for you and your family
  • Mental health and wellness support
  • 401(k) plan with 50% matching
  • Unlimited time off and 13 company holidays per year
  • Paid parental leave (20 weeks) and family-planning support
  • Annual learning & development stipend ($1,500 per year)
  • Equity

Interested in this job?

Jobs Related To OpenAI Distributed Training Engineer, Sora

Research Engineer / Scientist, Safety Reasoning

Research position at OpenAI focusing on improving AI safety systems and developing innovative machine learning techniques for enhanced model safety and reasoning capabilities.

Post-training - Model Fusion Research Engineer

OpenAI seeks a Post-training Model Fusion Research Engineer to enhance ChatGPT's capabilities and lead deployment improvements.

Computer Vision Engineer (Leadership)

Lead Computer Vision Engineer role at Meta, focusing on AR/VR technology development and team leadership in Reality Labs division.

Engineering Manager, Offline Inference, Machine Learning Platform

Lead the development of Netflix's next-generation offline inference platform, managing ML infrastructure team and architecting solutions for large-scale ML models.

Machine Learning Scientist (L5) - Content and Studio

Senior Machine Learning Scientist role at Netflix, focusing on studio analytics and cash forecasting, offering competitive compensation and remote work.