Anthropic is seeking a Safeguards Research Engineer to join their mission of creating safe and beneficial AI systems. As part of the Safeguards Research Team, you'll conduct critical safety research and engineering to ensure AI systems can be deployed safely. The role involves working on immediate safety challenges and longer-term research initiatives, including jailbreak robustness, automated red-teaming, and applied threat modeling.
You'll collaborate with multiple teams including Interpretability, Fine-Tuning, Frontier Red Team, and Alignment Science. The position requires both scientific and engineering mindsets, focusing on risks from current and future powerful AI systems. Projects include testing safety technique robustness, running multi-agent reinforcement learning experiments, and building evaluation tools for LLM-generated jailbreaks.
Anthropic operates as a cohesive team focused on large-scale research efforts, valuing impact over smaller specific puzzles. The company views AI research as an empirical science and maintains a highly collaborative environment. The role offers competitive compensation ($320,000-$560,000), flexible working arrangements, and comprehensive benefits including visa sponsorship.
The ideal candidate will have significant software and ML experience, familiarity with AI safety research, and strong collaborative skills. Experience with LLMs, reinforcement learning, and research paper authorship is highly valued. While based in San Francisco, the role requires at least 25% office presence, with flexibility for remote work. Join Anthropic in their mission to advance steerable, trustworthy AI while working on cutting-edge safety challenges in the field.