44
2 Likes

Paper Reading: DeepSeek R1 - Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Session led by Mike A

Join us for an insightful session on the groundbreaking paper, DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. In this session, we will explore DeepSeek-R1, a state-of-the-art reasoning model that pushes the boundaries of reinforcement learning (RL) applied to large language models (LLMs).

This paper introduces two novel models, DeepSeek-R1-Zero and DeepSeek-R1, showcasing advancements in reasoning capabilities through RL-driven self-evolution. Unlike traditional models relying heavily on supervised fine-tuning (SFT), DeepSeek-R1-Zero develops its reasoning abilities purely through RL, while DeepSeek-R1 combines RL with a multi-stage training pipeline for enhanced performance.

The paper highlights exceptional benchmarks achieved by DeepSeek-R1 on math, coding, and STEM-related reasoning tasks, where its performance rivals that of leading closed-source models such as OpenAI's o1-1217. Additionally, we’ll discuss distilling these capabilities into smaller, more efficient models to make advanced reasoning accessible for diverse applications.

Whether you're a researcher, developer, or enthusiast in AI and LLMs, this paper reading will provide an in-depth understanding of the novel reinforcement learning techniques driving DeepSeek-R1 and its implications for the future of AI-driven reasoning systems. Don't miss this opportunity to engage with cutting-edge advancements in the field! [2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

GitHub - deepseek-ai/DeepSeek-R1

Mark Chen X Post: https://x.com/markchen90/status/1884303237186216272?s=12&t=BSlMYtugAr8LLWXLnMLLrQ

GRPO Diagram: https://www.reddit.com/r/LocalLLaMA/comments/1i78sfs/deepseek_r1_grpo_code_open_sourced/#lightbox

The Bitter Lesson: http://www.incompleteideas.net/IncIdeas/BitterLesson.html

Gradient Descent Image: https://miro.medium.com/v2/format:webp/1\\\*f9a162GhpMbiTVTAua_lLQ.png

V3: https://arxiv.org/abs/2412.19437

DeepSeek Math: https://arxiv.org/abs/2402.03300

rstar-math: https://arxiv.org/abs/2501.04519

Verify Step-by-Step: https://arxiv.org/abs/2305.20050

DeepSeek catch-up chart: https://media.licdn.com/dms/image/v2/D4E22AQG552O63UNJWg/feedshare-shrink_800/B4EZTN\\\[…\\\]41824000&v=beta&t=ubyKQniaCJTL37PzIOJi9YZRo1AF8yipuauSioyn59U

UC Berkeley Student $30 replication of Aha: https://x.com/jiayi_pirate/status/1882839370505621655

Deep Agent R1-V replication: https://x.com/liangchen5518/status/1886171667522842856

s1: https://arxiv.org/abs/2501.19393

DeepSeek-r1 cost breakdown: https://semianalysis.com/2025/01/31/deepseek-debates/

R1 Deep Dive: https://fireworks.ai/blog/deepseek-r1-deepdive

All in Podcast: https://podcasts.apple.com/us/podcast/all-in-with-chamath-jason-sacks-friedberg/id1502871393?i=1000687638652

Peter Gostev LinkedIn: https://www.linkedin.com/in/peter-gostev/

LLM Agents Learning: https://llmagents-learning.org/sp25