Staff Software Engineer, Interpretability

Anthropic

Anthropic creates reliable, interpretable, and steerable AI systems, focusing on safe and beneficial AI development for users and society.

San Francisco, CA, USA

$315,000 - $560,000

Machine Learning

Staff Software Engineer

Hybrid

5+ years of experience

This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Staff Software Engineer, Interpretability

Anthropic is seeking a Staff Software Engineer to join their Interpretability team, focusing on creating safe and beneficial AI systems. The role involves working on mechanistic interpretability to understand how neural networks function, similar to doing "biology" or "neuroscience" of neural networks. The team recently achieved significant breakthroughs with Claude 3.0 Sonnet model, extracting millions of meaningful features and demonstrating behavior modification capabilities.

The position requires 5-10+ years of software development experience and offers a competitive salary range of $315,000 to $560,000 USD. The role combines technical expertise with research collaboration, requiring proficiency in languages like Python, Rust, Go, or Java. You'll work on implementing research experiments, optimizing workflows, and building tools for AI safety improvements.

The team operates in a hybrid work environment from their San Francisco office, with at least 25% in-office presence required. Anthropic offers comprehensive benefits including equity options, visa sponsorship, flexible hours, and generous leave policies. The company values diversity and encourages applications from candidates with varied perspectives and backgrounds.

As part of a cohesive team working on large-scale research efforts, you'll contribute to projects like optimizing sparse autoencoders across GPUs and building visualization tools for millions of features. The role emphasizes collaboration with researchers and other teams across Anthropic, including Alignment Science and Societal Impacts, to enhance model safety.

This position offers a unique opportunity to work at the forefront of AI safety and interpretability research, contributing to the understanding and development of trustworthy AI systems. The work directly impacts the safety and reliability of AI models like Claude, making it an ideal role for those passionate about responsible AI development and its societal implications.

Last updated 6 months ago

Responsibilities For Staff Software Engineer, Interpretability

Implement and analyze research experiments in toy scenarios and large models
Set up and optimize research workflows for large-scale operations
Build tools and abstractions for rapid research experimentation
Develop tools and infrastructure to support teams in using Interpretability's work for model safety

Requirements For Staff Software Engineer, Interpretability

Python

Rust

Java

5-10+ years of software building experience
Highly proficient in at least one programming language (Python, Rust, Go, Java)
Experience with empirical AI research projects
Strong ability to prioritize and direct effort toward impactful work
Comfortable with ambiguity and questioning assumptions
Preference for fast-moving collaborative projects
Interest in machine learning research and applications
Care about societal impacts and ethics of work

Benefits For Staff Software Engineer, Interpretability

Visa Sponsorship

Equity

Competitive compensation and benefits
Optional equity donation matching
Generous vacation and parental leave
Flexible working hours
Office space in San Francisco