Anthropic is seeking a Staff Software Engineer to join their Interpretability team, focusing on creating safe and beneficial AI systems. The role involves working on mechanistic interpretability to understand how neural networks function, similar to doing "biology" or "neuroscience" of neural networks. The team recently achieved significant breakthroughs with Claude 3.0 Sonnet model, extracting millions of meaningful features and demonstrating behavior modification capabilities.
The position requires 5-10+ years of software development experience and offers a competitive salary range of $315,000 to $560,000 USD. The role combines technical expertise with research collaboration, requiring proficiency in languages like Python, Rust, Go, or Java. You'll work on implementing research experiments, optimizing workflows, and building tools for AI safety improvements.
The team operates in a hybrid work environment from their San Francisco office, with at least 25% in-office presence required. Anthropic offers comprehensive benefits including equity options, visa sponsorship, flexible hours, and generous leave policies. The company values diversity and encourages applications from candidates with varied perspectives and backgrounds.
As part of a cohesive team working on large-scale research efforts, you'll contribute to projects like optimizing sparse autoencoders across GPUs and building visualization tools for millions of features. The role emphasizes collaboration with researchers and other teams across Anthropic, including Alignment Science and Societal Impacts, to enhance model safety.
This position offers a unique opportunity to work at the forefront of AI safety and interpretability research, contributing to the understanding and development of trustworthy AI systems. The work directly impacts the safety and reliability of AI models like Claude, making it an ideal role for those passionate about responsible AI development and its societal implications.