The Platform Networking team at OpenAI is responsible for the collective communication stack used in our largest training jobs. Using C++ and CUDA, we work on novel collective communication techniques that enable efficient training of our flagship models on our largest custom-built supercomputers.
As a Software Engineer, Networking, you will design and implement custom networking collectives that are tightly integrated into our training stack. We're looking for people with a background in low-level performance-critical software. Experience with collective communication is a bonus.
In this role, you will:
You might thrive in this role if you:
This role is based in San Francisco, CA, with a hybrid work model of 3 days in the office per week. Relocation assistance is offered to new employees.
OpenAI pushes the boundaries of AI capabilities and seeks to safely deploy them to the world through our products. We value diversity and are committed to creating an inclusive environment for all employees.