OpenAI is seeking a Software Engineer specializing in Networking to join their Platform Networking team. This role is crucial in developing the communication infrastructure that powers OpenAI's largest AI training operations. The position involves working with cutting-edge technology in a hybrid work environment based in San Francisco.
The Platform Networking team is responsible for the collective communication stack used in OpenAI's largest training jobs. Using C++ and CUDA, the team develops novel collective communication techniques that enable efficient training of flagship models on custom-built supercomputers. This work directly impacts AI research progress at OpenAI and the field as a whole.
As a Software Engineer in Networking, you'll be designing and implementing custom networking collectives tightly integrated into the training stack. The role requires expertise in low-level performance-critical software development, with collective communication experience being a valuable addition. You'll work closely with ML researchers, optimize network transports for large-scale training jobs, and contribute to future supercomputer network designs.
The position offers competitive compensation ranging from $380K to $555K and includes benefits such as relocation assistance. OpenAI provides a hybrid work model requiring 3 days in the office per week. The company is committed to diversity, equality, and ensuring AI benefits all of humanity.
This role is perfect for someone who thrives on technical challenges, has experience with RDMA distributed algorithms, and is comfortable with low-level performance-sensitive code. You'll be at the forefront of AI infrastructure development, working on systems that enable the training of some of the most advanced AI models in the world.