Staff Software Engineer, Interpretability

Anthropic creates reliable, interpretable, and steerable AI systems, focusing on safe and beneficial AI development for users and society.
$315,000 - $560,000
Machine Learning
Staff Software Engineer
Hybrid
5+ years of experience
AI
This job posting may no longer be active. You may be interested in these related jobs instead:
Research Engineer, Pre-training

Research Engineer position at Anthropic focusing on pre-training large language models, combining cutting-edge AI research with practical engineering implementation.

Machine Learning Systems Engineer, Encodings and Tokenization

Machine Learning Systems Engineer role at Anthropic focusing on developing and optimizing encodings and tokenization systems for AI model training.

Machine Learning Systems Engineer, Model APIs

Machine Learning Systems Engineer role at Anthropic focused on building and maintaining Model Evaluations infrastructure and Research Inference APIs.

Research Engineer, Pre-training

Research Engineer position at Anthropic focusing on pre-training large language models, combining cutting-edge AI research with practical engineering to develop safe and trustworthy AI systems.

Sr. Machine Learning - Compiler Engineer III, AWS Neuron, Annapurna Labs

Senior Machine Learning Compiler Engineer position at AWS Neuron team, focusing on optimizing ML models for AWS Inferentia and Trainium custom chips.

Description For Staff Software Engineer, Interpretability

Anthropic is seeking a Staff Software Engineer to join their Interpretability team, focusing on creating safe and beneficial AI systems. The role involves working on mechanistic interpretability to understand how neural networks function, similar to doing "biology" or "neuroscience" of neural networks. The team recently achieved significant breakthroughs with Claude 3.0 Sonnet model, extracting millions of meaningful features and demonstrating behavior modification capabilities.

The position requires 5-10+ years of software development experience and offers a competitive salary range of $315,000 to $560,000 USD. The role combines technical expertise with research collaboration, requiring proficiency in languages like Python, Rust, Go, or Java. You'll work on implementing research experiments, optimizing workflows, and building tools for AI safety improvements.

The team operates in a hybrid work environment from their San Francisco office, with at least 25% in-office presence required. Anthropic offers comprehensive benefits including equity options, visa sponsorship, flexible hours, and generous leave policies. The company values diversity and encourages applications from candidates with varied perspectives and backgrounds.

As part of a cohesive team working on large-scale research efforts, you'll contribute to projects like optimizing sparse autoencoders across GPUs and building visualization tools for millions of features. The role emphasizes collaboration with researchers and other teams across Anthropic, including Alignment Science and Societal Impacts, to enhance model safety.

This position offers a unique opportunity to work at the forefront of AI safety and interpretability research, contributing to the understanding and development of trustworthy AI systems. The work directly impacts the safety and reliability of AI models like Claude, making it an ideal role for those passionate about responsible AI development and its societal implications.

Last updated 3 months ago

Responsibilities For Staff Software Engineer, Interpretability

  • Implement and analyze research experiments in toy scenarios and large models
  • Set up and optimize research workflows for large-scale operations
  • Build tools and abstractions for rapid research experimentation
  • Develop tools and infrastructure to support teams in using Interpretability's work for model safety

Requirements For Staff Software Engineer, Interpretability

Python
Rust
Go
Java
  • 5-10+ years of software building experience
  • Highly proficient in at least one programming language (Python, Rust, Go, Java)
  • Experience with empirical AI research projects
  • Strong ability to prioritize and direct effort toward impactful work
  • Comfortable with ambiguity and questioning assumptions
  • Preference for fast-moving collaborative projects
  • Interest in machine learning research and applications
  • Care about societal impacts and ethics of work

Benefits For Staff Software Engineer, Interpretability

Visa Sponsorship
Equity
  • Competitive compensation and benefits
  • Optional equity donation matching
  • Generous vacation and parental leave
  • Flexible working hours
  • Office space in San Francisco

Interested in this job?