Safeguards Research Engineer

AI research company focused on creating reliable, interpretable, and steerable AI systems for safe and beneficial deployment.
$320,000 - $560,000
Machine Learning
Senior Software Engineer
Hybrid
501 - 1,000 Employees
5+ years of experience
AI

Description For Safeguards Research Engineer

Anthropic is seeking a Safeguards Research Engineer to join their mission of creating safe and beneficial AI systems. As part of the Safeguards Research Team, you'll conduct critical safety research and engineering to ensure AI systems can be deployed safely. The role involves working on immediate safety challenges and longer-term research initiatives, including jailbreak robustness, automated red-teaming, and applied threat modeling.

You'll collaborate with multiple teams including Interpretability, Fine-Tuning, Frontier Red Team, and Alignment Science. The position requires both scientific and engineering mindsets, focusing on risks from current and future powerful AI systems. Projects include testing safety technique robustness, running multi-agent reinforcement learning experiments, and building evaluation tools for LLM-generated jailbreaks.

Anthropic operates as a cohesive team focused on large-scale research efforts, valuing impact over smaller specific puzzles. The company views AI research as an empirical science and maintains a highly collaborative environment. The role offers competitive compensation ($320,000-$560,000), flexible working arrangements, and comprehensive benefits including visa sponsorship.

The ideal candidate will have significant software and ML experience, familiarity with AI safety research, and strong collaborative skills. Experience with LLMs, reinforcement learning, and research paper authorship is highly valued. While based in San Francisco, the role requires at least 25% office presence, with flexibility for remote work. Join Anthropic in their mission to advance steerable, trustworthy AI while working on cutting-edge safety challenges in the field.

Last updated 2 hours ago

Responsibilities For Safeguards Research Engineer

  • Conduct critical safety research and engineering for AI systems
  • Test robustness of safety techniques through model training
  • Run multi-agent reinforcement learning experiments
  • Build tooling to evaluate LLM-generated jailbreaks
  • Write scripts and prompts for model evaluation
  • Contribute to research papers, blog posts, and talks
  • Run experiments for Responsible Scaling Policy implementation

Requirements For Safeguards Research Engineer

Python
Kubernetes
  • Bachelor's degree in a related field or equivalent experience
  • Significant software, ML, or research engineering experience
  • Experience contributing to empirical AI research projects
  • Familiarity with technical AI safety research
  • Ability to work in collaborative projects
  • Must be able to travel 25% to the Bay Area

Benefits For Safeguards Research Engineer

Visa Sponsorship
Equity
  • Competitive compensation and benefits
  • Optional equity donation matching
  • Generous vacation and parental leave
  • Flexible working hours
  • Office space in San Francisco
  • Visa sponsorship available

Interested in this job?

Jobs Related To Anthropic Safeguards Research Engineer

Biosecurity Research Engineer

Senior Machine Learning Engineer role focused on AI safety and biosecurity research at Anthropic

Research Engineer, Frontier Red Team

Senior Research Engineer position at Anthropic focusing on AI safety evaluation and implementation of responsible scaling policies for frontier AI models.

Research Engineer, Frontier Red Team

Senior Research Engineer position at Anthropic focusing on AI safety evaluation and risk assessment for frontier AI models.

Software Engineer, Model Context Protocol

Senior Software Engineer position at Anthropic focusing on Model Context Protocol development with competitive compensation range of $320K-$560K.

Software Engineer - Anthropic Labs

Software Engineer role at Anthropic Labs focusing on prototyping and evaluating emerging AI capabilities.