Software Engineer, Hardware Health

AI research and deployment company dedicated to ensuring general-purpose artificial intelligence benefits all of humanity.
$295,000 - $440,000
Backend
Senior Software Engineer
In-Person
7+ years of experience
AI

Description For Software Engineer, Hardware Health

OpenAI's Hardware Health team is dedicated to ensuring the optimal performance and reliability of custom-built hyperscale supercomputers. The team focuses on maximizing supercomputing capacity for research and ensuring minimal impact of hardware faults on researchers. As a software engineer on the Hardware Health team, you will maintain a sophisticated suite of hardware health tests and collaborate with researchers and the Supercomputing team on root-causing and reproducing newly discovered problems. The role requires 7+ years of industry experience in software engineering, excellent abilities in Python and shell scripting, and comfort with data analysis using SQL, PromQL, and Pandas. The ideal candidate should have a strong sense of ownership, attention to detail, and prior team lead experience. The team operates within the broader Platform organization, incubated inside OpenAI's Research team, supporting the engineering and research required to train large-scale AI models. OpenAI offers a fast-paced environment with high autonomy and the opportunity to affect change in cutting-edge AI research infrastructure.

Key Responsibilities:

  • Maintain and develop hardware health tests
  • Collaborate on root-causing and reproducing hardware problems
  • Analyze data using SQL, PromQL, and Pandas
  • Develop reproducible analyses and build dashboards/visualizations
  • Monitor outcomes of deployed updates

Required Skills:

  • 7+ years of industry experience in software engineering
  • Strong Python and shell scripting abilities
  • Data analysis skills with SQL, PromQL, and Pandas
  • Experience in building dashboards and visualizations
  • Attention to detail in identifying data anomalies
  • Prior team lead experience (2+ years)

Bonus Skills:

  • Expertise in low-level hardware components and protocols
  • Experience with Linux tooling
  • Visualization skills for large data centers and networks

OpenAI is committed to diversity, equal opportunity, and providing reasonable accommodations to applicants with disabilities. The company offers a competitive compensation range of $295K – $440K for this role.

Last updated 7 months ago

Responsibilities For Software Engineer, Hardware Health

  • Maintain a sophisticated suite of hardware health tests
  • Collaborate with researchers and Supercomputing team on root-causing and reproducing problems
  • Analyze data using SQL, PromQL, and Pandas
  • Develop reproducible analyses and build dashboards/visualizations
  • Monitor outcome of deployed updates

Requirements For Software Engineer, Hardware Health

Python
Linux
  • 7+ years of industry experience in software engineering
  • Excellent abilities developing in Python and shell scripting
  • High degree of comfort with SQL, PromQL, and Pandas
  • Experience developing reproducible analyses / building dashboards and visualizations
  • Strong attention to detail
  • Prior TL experience (2+ years)

Interested in this job?

Jobs Related To OpenAI Software Engineer, Hardware Health

Research Infrastructure Engineer - Post-Training

Senior Research Infrastructure Engineer role at OpenAI, building and optimizing systems for ChatGPT's post-training phase, offering $310K-$460K plus equity and comprehensive benefits.

Software Engineer, Financial Engineering

Senior Software Engineer role at OpenAI focusing on building and architecting financial engineering systems for billing and monetization.

Software Engineer, Internal Applications - Enterprise

Senior Software Engineer role at OpenAI focusing on internal applications and enterprise infrastructure automation using modern cloud technologies and DevOps practices.

Software Engineer, Financial Engineering

Senior Software Engineer role at OpenAI focusing on building and architecting next-generation billing and monetization systems.

Software Engineer, Internal Applications - Enterprise

Senior Software Engineer role at OpenAI focusing on internal applications and enterprise infrastructure automation