OpenAI is seeking a Research Infrastructure Engineer to join their post-training team, focusing on transforming large pre-trained models into user-friendly chatbots like ChatGPT. This role combines deep technical expertise with ML systems optimization and distributed systems knowledge. Based in San Francisco with a hybrid work model (3 days in office), the position offers a competitive salary range of $310K-$460K plus equity and comprehensive benefits.
The role involves working across the entire technology stack, from optimizing low-level ML systems to managing job orchestration and data evaluation. You'll be responsible for building cutting-edge infrastructure and tools fundamental to ChatGPT's post-training phase. The team collaborates closely with research groups, creating systems that push the boundaries of what's possible with ChatGPT.
Key responsibilities include ensuring smooth operation of ChatGPT training systems, debugging complex ML codebases, building data management tools, and creating reusable Python libraries. You'll work on projects like profiling large model reinforcement learning training, identifying experiment failures, and redesigning data pipelines for multimodal data.
The ideal candidate should have experience with Python, Kubernetes, distributed infrastructure, GPUs, and large-scale data systems. Knowledge of reinforcement learning and transformers is crucial. While research experience isn't mandatory, experience collaborating with ML researchers in an applied setting is highly valued.
OpenAI offers an exceptional benefits package including medical/dental/vision insurance, mental health support, 401(k) matching, generous parental leave, and learning stipends. The company is committed to diversity, equality, and ensuring AI benefits all of humanity. This is an opportunity to shape the future of AI technology while working with cutting-edge systems and brilliant minds in the field.