Software Engineer, Evals Infrastructure (Preparedness)

AI research and deployment company dedicated to ensuring general-purpose artificial intelligence benefits humanity
$310,000
Site Reliability
Staff Software Engineer
In-Person
7+ years of experience
AI

Description For Software Engineer, Evals Infrastructure (Preparedness)

OpenAI is seeking a Software Engineer for their Evals Infrastructure team within the Safety Systems department. This role is crucial for ensuring the safe deployment of AI models and maintaining infrastructure reliability. The position sits within the Preparedness team, which focuses on identifying and mitigating risks associated with frontier AI models.

The role combines site reliability engineering with AI safety, requiring expertise in scaling infrastructure and implementing robust monitoring systems. You'll be responsible for maintaining and enhancing system stability while supporting OpenAI's mission to develop safe AGI. The position offers a competitive salary of $310,000 plus equity and comprehensive benefits.

Key responsibilities include scaling evaluation infrastructure, implementing monitoring systems, and maintaining service level objectives. You'll work closely with cross-functional teams, participating in on-call rotations and production readiness reviews. The ideal candidate brings 7+ years of software engineering experience, strong cloud infrastructure knowledge, and expertise with tools like Kubernetes and observability platforms.

This San Francisco-based position offers an opportunity to work at the forefront of AI development while ensuring system reliability and safety. You'll join a team dedicated to preparing for and mitigating risks associated with increasingly capable AI systems. The role combines technical expertise with the broader mission of ensuring AI benefits humanity safely and effectively.

OpenAI provides comprehensive benefits including medical insurance, mental health support, generous parental leave, and learning opportunities. They foster an inclusive culture and are committed to considering diverse perspectives in AI development.

Last updated 13 days ago

Responsibilities For Software Engineer, Evals Infrastructure (Preparedness)

  • Work on scaling infrastructure to support evaluations, supporting systems and automation
  • Collaborate with development teams to make systems more reliable
  • Implement and manage monitoring systems
  • Develop and maintain service level objectives (SLOs) and indicators (SLIs)
  • Implement fault-tolerant and resilient design patterns
  • Build and maintain automation tools
  • Partner with engineers and researchers
  • Participate in on-call rotation

Requirements For Software Engineer, Evals Infrastructure (Preparedness)

Kubernetes
Linux
  • Bachelor's degree in Computer Science, Information Technology, or related field
  • 7+ years of professional software engineering experience
  • Experience as a reliability engineer in a fast-paced company
  • Strong proficiency in cloud infrastructure
  • Proficiency in programming/scripting languages
  • Experience with containerization and Kubernetes
  • Knowledge of Infrastructure as Code tools
  • Experience with observability tools (DataDog, Prometheus, Grafana, Splunk, ELK stack)
  • Experience with microservices architecture
  • Knowledge of security best practices in cloud environments

Benefits For Software Engineer, Evals Infrastructure (Preparedness)

Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Assistance
401k
Parental Leave
Education Budget
  • Medical, dental, and vision insurance for you and your family
  • Mental health and wellness support
  • 401(k) plan with 50% matching
  • Generous time off and company holidays
  • 24 weeks paid birth-parent leave & 20-week paid parental leave
  • Annual learning & development stipend ($1,500 per year)
  • Equity compensation
  • Relocation assistance

Interested in this job?

Jobs Related To OpenAI Software Engineer, Evals Infrastructure (Preparedness)

Software Engineer, Reliability

OpenAI is seeking a Software Engineer, Reliability to ensure system scalability, reliability, and performance as the company grows.

Sr Staff Software Engineer, Reliability Engineering

Senior Staff SRE position at Airbnb focusing on reliability architecture, incident management, and technical leadership, offering competitive compensation and remote work flexibility.

Staff Software Engineer, Reliability Engineering

Staff Software Engineer position at Airbnb focusing on Site Reliability Engineering, developing and maintaining tools for service reliability at scale.

Lead Site Reliability Engineer

Lead SRE position at Wellhub, focusing on cloud infrastructure, Kubernetes, and DevOps practices, offering hybrid work and comprehensive benefits.

Senior Site Reliability Developer (JoinOCI-Ns2)

Senior SRE role at Oracle focusing on cloud infrastructure, automation, and system reliability with competitive benefits and security clearance requirement.