Anthropic is seeking a Research Engineer to join their Reinforcement Learning Fundamentals team. In this role, you will collaborate with researchers and engineers to advance the capabilities and safety of large language models through fundamental research in reinforcement learning. You'll work on improving reasoning abilities in areas such as code generation and mathematics, and explore reinforcement learning for agentic / open-ended tasks.
Key responsibilities include:
- Developing and implementing novel reinforcement learning techniques to improve the performance and safety of large language models
- Creating tools and environments for models to interact with, enabling them to perform complex, open-ended tasks
- Designing and running experiments to enhance models' reasoning capabilities, particularly in code generation and mathematics
The ideal candidate will have:
- 5+ years of industry-related experience
- Proficiency in Python and experience with deep learning frameworks such as PyTorch or Jax
- Strong software engineering background
- Passion for pair programming
- Commitment to code quality, testing, and performance
- A deep interest in the potential impact of AI and dedication to developing safe and beneficial systems
Strong candidates may also have:
- Background in machine learning, reinforcement learning, or high performance computing
- Experience with virtualization and sandboxed code execution environments
- Experience with Kubernetes
- Contributions to open-source projects or published research papers in relevant fields
Anthropic offers a competitive compensation package, including a salary range of £250,000 - £340,000 GBP, equity, and comprehensive benefits. The company has a hybrid work policy, expecting staff to be in one of their offices at least 25% of the time.
Join Anthropic in their mission to create reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society as a whole.