Zhongzhu / Charlie
Home
Research
Publication
Experience
Recent News
Blog
CV
↗
Tag
#
Reasoning
40 posts tagged with this label. Back to
all tags
or the
main feed
.
2026
06-30
EN
DAPO: An Open-Source LLM Reinforcement Learning System at Scale — Technical Review
06-30
中
DAPO:大规模 LLM 强化学习系统阅读笔记
06-29
EN
ACTS: Steering How LLMs Reason, Not Just How Long
06-29
中
ACTS:用强化学习训练的控制器,让 LLM 推理更聪明而不只是更短
06-23
EN
Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback
06-23
中
Critique-GRPO:用自然语言批评反馈突破强化学习训练瓶颈
06-22
EN
MRAgent: Why Memory Should Be Reconstructed, Not Retrieved
06-22
中
MRAgent:记忆应该被重建,而不是被检索
06-19
EN
LASER: How Throwing Away 99% of a Weight Matrix Can Make LLMs Smarter
06-19
中
LASER:丢掉 99% 的矩阵秩,LLM 推理准确率反而提高了 27%
06-09
EN
VAPO: Value-Augmented Proximal Policy Optimization for Long-CoT Reasoning
06-09
中
VAPO:面向长链推理的价值增强近端策略优化
06-08
EN
ExpWeaver: How LLM Agents Learn from Past Experience in Latent Space
06-08
中
ExpWeaver:LLM 智能体如何在隐空间中从经验中学习
06-06
EN
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
06-06
中
DeepSeek-R1:用强化学习激发大语言模型的推理能力
05-31
EN
Group Sequence Policy Optimization: A Sequence-Level RL Algorithm for Training Large Language Models
05-31
中
Group Sequence Policy Optimization:序列级重要性采样修正 GRPO 的 RL 训练方法
05-25
EN
CodeAct: Executable Code Actions Elicit Better LLM Agents
05-25
中
CodeAct:用可执行代码驱动更强的 LLM Agent
05-23
EN
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
05-23
中
DeepSeek-R1:用强化学习激发大语言模型的推理能力
05-18
EN
Why Single-Agent LLMs Beat Multi-Agent Systems on Multi-Hop Reasoning — A Budget-Controlled Story
05-18
中
思考预算锁死之后,单 Agent 为什么打过多 Agent —— 阅读笔记
05-15
EN
Zero Sum SVD: A Global, Loss-Aware Rank Budget for LLM Compression
05-15
中
Zero Sum SVD:用「损失零和」做全局奇异值预算分配的 LLM 压缩方法
05-12
EN
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
05-12
中
DAPO:大规模开源 LLM 强化学习系统
05-01
EN
Low-Rank Optimization Trajectories for LLM RLVR Acceleration: A Technical Review of NExt
04-27
EN
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond — Technical Review
04-22
EN
SAGE: Training-Free Semantic Evidence Composition for Edge-Cloud Inference Under Hard Uplink Budgets
04-19
EN
SpecGuard: Verification-Aware Speculative Decoding for Efficient Multi-Step Reasoning
04-19
中
SpecGuard:用于多步推理的验证感知推测解码
04-13
EN
Toolformer: Language Models Can Teach Themselves to Use Tools — Deep Technical Review
04-13
中
Toolformer:让语言模型自己学会“什么时候调用工具”——深度阅读笔记
04-11
EN
Language Agent Tree Search (LATS): Unifying Reasoning, Acting, and Planning in Language Models — Deep Technical Review
04-11
中
LATS(Language Agent Tree Search):把推理、行动、规划统一到同一个语言模型代理框架里 — 深度阅读笔记
03-30
EN
Chain-of-Thought Prompting Elicits Reasoning in LLMs — In-Depth Technical Review
02-20
EN
DeepSeekMath: How 120B Tokens of Math Data and GRPO Rival GPT-4 on Competition Problems
02-16
EN
Tree of Thoughts: Deliberate Problem Solving with Large Language Models — Technical Review