Tag

#LLM Training

56 posts tagged with this label. Back to all tags or the main feed.

2026

07-02 EN

Tangram: Hiding GPU Heterogeneity for Efficient LLM Parallelization
07-02 中

Tangram：为异构GPU集群隐藏硬件差异的高效LLM并行化系统
06-30 EN

DAPO: An Open-Source LLM Reinforcement Learning System at Scale — Technical Review
06-30 中

DAPO：大规模 LLM 强化学习系统阅读笔记
06-16 EN

Back to Basics: Revisiting REINFORCE Style Optimization for RLHF (RLOO)
06-16 中

回归基础：用 RLOO 重新思考 RLHF 中的策略梯度优化
06-13 EN

ForeMoE: Micro-step-level MoE Load Balancing for RL Post-training via Routing Foresight
06-13 中

ForeMoE：利用路由预见性实现 RL 后训练中 MoE 微步级负载均衡
06-11 EN

MegaScale: Engineering 55% MFU at 12,288 GPUs for LLM Training
06-11 中

MegaScale：ByteDance 如何在 12,288 块 GPU 上实现 55% MFU 的大规模 LLM 训练
06-09 EN

VAPO: Value-Augmented Proximal Policy Optimization for Long-CoT Reasoning
06-09 中

VAPO：面向长链推理的价值增强近端策略优化
06-06 EN

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
06-06 中

DeepSeek-R1：用强化学习激发大语言模型的推理能力
06-02 EN

REINFORCE++: Stabilizing Critic-Free Policy Optimization with Global Advantage Normalization
06-02 中

REINFORCE++：用全局优势归一化稳定免批评家策略优化
05-31 EN

Group Sequence Policy Optimization: A Sequence-Level RL Algorithm for Training Large Language Models
05-31 中

Group Sequence Policy Optimization：序列级重要性采样修正 GRPO 的 RL 训练方法
05-26 EN

SimPO: Simple Preference Optimization with a Reference-Free Reward
05-26 中

SimPO：无需参考模型的简洁偏好优化
05-25 EN

CodeAct: Executable Code Actions Elicit Better LLM Agents
05-25 中

CodeAct：用可执行代码驱动更强的 LLM Agent
05-24 EN

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
05-24 中

FlashAttention-2：更好的并行策略与线程块工作划分
05-23 EN

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
05-23 中

DeepSeek-R1：用强化学习激发大语言模型的推理能力
05-22 EN

DoRA: Weight-Decomposed Low-Rank Adaptation — Technical Review
05-22 中

DoRA：权重分解低秩自适应——用幅度与方向解耦提升 LoRA 学习能力 | 阅读笔记
05-19 EN

KTO: Model Alignment as Prospect Theoretic Optimization — Technical Blog Review
05-19 中

KTO：把模型对齐看成「前景理论」优化 —— 阅读笔记
05-15 EN

Zero Sum SVD: A Global, Loss-Aware Rank Budget for LLM Compression
05-15 中

Zero Sum SVD：用「损失零和」做全局奇异值预算分配的 LLM 压缩方法
05-14 EN

DisagMoE: Disaggregating Attention and FFN to Beat the MoE All-to-All Bottleneck
05-14 中

DisagMoE：用解耦 Attention 和 FFN 打通 MoE 训练的 all-to-all 瓶颈
05-09 EN

Queueing Stability for LLM Inference with KV Cache Memory Constraints
05-07 EN

Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism
05-01 EN

Low-Rank Optimization Trajectories for LLM RLVR Acceleration: A Technical Review of NExt
04-29 EN

FEPLB Technical Review: Nearly Free MoE Load Balancing with the NVLink Copy Engine
04-24 EN

Generalization at the Edge of Stability: A Random Dynamical Systems Perspective
04-24 EN

FEPLB: Zero-Cost MoE Load Balancing via NVLink Copy Engine
04-16 EN

PipeDream: Turning Pipeline Parallelism into a Practical Training System — Deep Technical Review
04-16 中

PipeDream：把 Pipeline Parallelism 做成真正可训练系统——深度阅读笔记
04-09 EN

DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving — Deep Technical Review
04-09 中

DistServe：通过 Prefill/Decoding 解耦实现面向 Goodput 的大模型服务优化 — 深度阅读笔记
04-07 EN

ORPO: Monolithic Preference Optimization without Reference Model — In-Depth Technical Review
04-07 中

ORPO：不用参考模型的一体化偏好优化 — 深度阅读笔记
03-27 EN

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection — In-Depth Technical Review
03-26 EN

Alpa: Automating Inter- and Intra-Operator Parallelism — In-Depth Technical Review
03-19 EN

ZeRO: Shattering the Memory Wall — How DeepSpeed Trains Trillion-Parameter Models
03-12 EN

Megatron-LM: NVIDIA's Blueprint for Training Billion-Parameter Models at Scale
03-12 EN

PaRO: Smarter Partitioning for Distributed Training — Beyond ZeRO's One-Size-Fits-All
02-19 EN

vLLM and PagedAttention: Efficient Memory Management for Large Language Model Serving — Technical Review
02-18 EN

GLM-5 Technical Review: From Vibe Coding to Agentic Engineering

2023

04-29 EN

ComputerArchitecture-Day1

2019

11-26 EN

Tensorflow-Day1-DNN Explain
11-24 EN

Reinforcement Learning\_WatermelonBook\_Summary