Software Engineer, Networking

AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity.
$360,000 - $530,000
Distributed Systems
Senior Software Engineer
Hybrid
AI

Description For Software Engineer, Networking

The Platform Networking team at OpenAI is responsible for the collective communication stack used in our largest training jobs. Using C++ and CUDA, we work on novel collective communication techniques that enable efficient training of our flagship models on our largest custom-built supercomputers.

As a Software Engineer, Networking, you will design and implement custom networking collectives that are tightly integrated into our training stack. We're looking for people with a background in low-level performance-critical software. Experience with collective communication is a bonus.

In this role, you will:

  • Collaborate closely with ML researchers to design and implement efficient collective operations in C++ and CUDA.
  • Ensure that our largest training jobs take full advantage of the different network transports used in our supercomputers.
  • Work on simulations to inform our future supercomputer network designs.

You might thrive in this role if you:

  • Have written distributed algorithms using RDMA in the past.
  • Are comfortable writing low-level performance-sensitive CPU and/or GPU code.
  • Are familiar with network simulation techniques.

This role is based in San Francisco, CA, with a hybrid work model of 3 days in the office per week. Relocation assistance is offered to new employees.

OpenAI pushes the boundaries of AI capabilities and seeks to safely deploy them to the world through our products. We value diversity and are committed to creating an inclusive environment for all employees.

Last updated 4 months ago

Responsibilities For Software Engineer, Networking

  • Design and implement custom networking collectives integrated into the training stack
  • Collaborate with ML researchers to design and implement efficient collective operations in C++ and CUDA
  • Ensure largest training jobs utilize different network transports in supercomputers effectively
  • Work on simulations for future supercomputer network designs

Requirements For Software Engineer, Networking

  • Background in low-level performance-critical software
  • Experience with distributed algorithms using RDMA (preferred)
  • Comfortable writing low-level performance-sensitive CPU and/or GPU code
  • Familiarity with network simulation techniques

Benefits For Software Engineer, Networking

Relocation Benefits
  • Relocation Benefits

Interested in this job?

Jobs Related To OpenAI Software Engineer, Networking

Software Engineer, Compute - Storage

Senior Software Engineer position at OpenAI focusing on storage infrastructure and exascale data management systems, offering competitive compensation and comprehensive benefits.

Software Engineer in Systems

Senior Software Engineer role at OpenAI focusing on distributed systems for AI model training, offering competitive compensation and comprehensive benefits.

Distributed Systems Engineer, Security

Senior Distributed Systems Engineer role at OpenAI focusing on security infrastructure and system optimization for large-scale AI computing environments.

Software Engineer, Distributed Systems

OpenAI is hiring a Senior Software Engineer for Distributed Systems to build large-scale data systems for AI research in San Francisco.

Software Engineer

Senior Software Engineer position at xAI focusing on distributed systems development, requiring expertise in systems programming and Python ecosystem.