Zhongzhu / Charlie
Home
Research
Publication
Experience
Recent News
Blog
CV
↗
Tag
#
LLM Training
56 posts tagged with this label. Back to
all tags
or the
main feed
.
2026
07-02
EN
Tangram: Hiding GPU Heterogeneity for Efficient LLM Parallelization
07-02
中
Tangram:为异构GPU集群隐藏硬件差异的高效LLM并行化系统
06-30
EN
DAPO: An Open-Source LLM Reinforcement Learning System at Scale — Technical Review
06-30
中
DAPO:大规模 LLM 强化学习系统阅读笔记
06-16
EN
Back to Basics: Revisiting REINFORCE Style Optimization for RLHF (RLOO)
06-16
中
回归基础:用 RLOO 重新思考 RLHF 中的策略梯度优化
06-13
EN
ForeMoE: Micro-step-level MoE Load Balancing for RL Post-training via Routing Foresight
06-13
中
ForeMoE:利用路由预见性实现 RL 后训练中 MoE 微步级负载均衡
06-11
EN
MegaScale: Engineering 55% MFU at 12,288 GPUs for LLM Training
06-11
中
MegaScale:ByteDance 如何在 12,288 块 GPU 上实现 55% MFU 的大规模 LLM 训练
06-09
EN
VAPO: Value-Augmented Proximal Policy Optimization for Long-CoT Reasoning
06-09
中
VAPO:面向长链推理的价值增强近端策略优化
06-06
EN
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
06-06
中
DeepSeek-R1:用强化学习激发大语言模型的推理能力
06-02
EN
REINFORCE++: Stabilizing Critic-Free Policy Optimization with Global Advantage Normalization
06-02
中
REINFORCE++:用全局优势归一化稳定免批评家策略优化
05-31
EN
Group Sequence Policy Optimization: A Sequence-Level RL Algorithm for Training Large Language Models
05-31
中
Group Sequence Policy Optimization:序列级重要性采样修正 GRPO 的 RL 训练方法
05-26
EN
SimPO: Simple Preference Optimization with a Reference-Free Reward
05-26
中
SimPO:无需参考模型的简洁偏好优化
05-25
EN
CodeAct: Executable Code Actions Elicit Better LLM Agents
05-25
中
CodeAct:用可执行代码驱动更强的 LLM Agent
05-24
EN
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
05-24
中
FlashAttention-2:更好的并行策略与线程块工作划分
05-23
EN
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
05-23
中
DeepSeek-R1:用强化学习激发大语言模型的推理能力
05-22
EN
DoRA: Weight-Decomposed Low-Rank Adaptation — Technical Review
05-22
中
DoRA:权重分解低秩自适应——用幅度与方向解耦提升 LoRA 学习能力 | 阅读笔记
05-19
EN
KTO: Model Alignment as Prospect Theoretic Optimization — Technical Blog Review
05-19
中
KTO:把模型对齐看成「前景理论」优化 —— 阅读笔记
05-15
EN
Zero Sum SVD: A Global, Loss-Aware Rank Budget for LLM Compression
05-15
中
Zero Sum SVD:用「损失零和」做全局奇异值预算分配的 LLM 压缩方法
05-14
EN
DisagMoE: Disaggregating Attention and FFN to Beat the MoE All-to-All Bottleneck
05-14
中
DisagMoE:用解耦 Attention 和 FFN 打通 MoE 训练的 all-to-all 瓶颈
05-09
EN
Queueing Stability for LLM Inference with KV Cache Memory Constraints
05-07
EN
Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism
05-01
EN
Low-Rank Optimization Trajectories for LLM RLVR Acceleration: A Technical Review of NExt
04-29
EN
FEPLB Technical Review: Nearly Free MoE Load Balancing with the NVLink Copy Engine
04-24
EN
Generalization at the Edge of Stability: A Random Dynamical Systems Perspective
04-24
EN
FEPLB: Zero-Cost MoE Load Balancing via NVLink Copy Engine
04-16
EN
PipeDream: Turning Pipeline Parallelism into a Practical Training System — Deep Technical Review
04-16
中
PipeDream:把 Pipeline Parallelism 做成真正可训练系统——深度阅读笔记
04-09
EN
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving — Deep Technical Review
04-09
中
DistServe:通过 Prefill/Decoding 解耦实现面向 Goodput 的大模型服务优化 — 深度阅读笔记
04-07
EN
ORPO: Monolithic Preference Optimization without Reference Model — In-Depth Technical Review
04-07
中
ORPO:不用参考模型的一体化偏好优化 — 深度阅读笔记
03-27
EN
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection — In-Depth Technical Review
03-26
EN
Alpa: Automating Inter- and Intra-Operator Parallelism — In-Depth Technical Review
03-19
EN
ZeRO: Shattering the Memory Wall — How DeepSpeed Trains Trillion-Parameter Models
03-12
EN
Megatron-LM: NVIDIA's Blueprint for Training Billion-Parameter Models at Scale
03-12
EN
PaRO: Smarter Partitioning for Distributed Training — Beyond ZeRO's One-Size-Fits-All
02-19
EN
vLLM and PagedAttention: Efficient Memory Management for Large Language Model Serving — Technical Review
02-18
EN
GLM-5 Technical Review: From Vibe Coding to Agentic Engineering
2023
04-29
EN
ComputerArchitecture-Day1
2019
11-26
EN
Tensorflow-Day1-DNN Explain
11-24
EN
Reinforcement Learning\_WatermelonBook\_Summary