Zhongzhu / Charlie
Home
Research
Publication
Experience
Recent News
Blog
CV
↗
Tag
#
LLM Training
26 posts tagged with this label. Back to
all tags
or the
main feed
.
2026
05-15
EN
Zero Sum SVD: A Global, Loss-Aware Rank Budget for LLM Compression
05-15
中
Zero Sum SVD:用「损失零和」做全局奇异值预算分配的 LLM 压缩方法
05-14
EN
DisagMoE: Disaggregating Attention and FFN to Beat the MoE All-to-All Bottleneck
05-14
中
DisagMoE:用解耦 Attention 和 FFN 打通 MoE 训练的 all-to-all 瓶颈
05-09
EN
Queueing Stability for LLM Inference with KV Cache Memory Constraints
05-07
EN
Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism
05-01
EN
Low-Rank Optimization Trajectories for LLM RLVR Acceleration: A Technical Review of NExt
04-29
EN
FEPLB Technical Review: Nearly Free MoE Load Balancing with the NVLink Copy Engine
04-24
EN
Generalization at the Edge of Stability: A Random Dynamical Systems Perspective
04-24
EN
FEPLB: Zero-Cost MoE Load Balancing via NVLink Copy Engine
04-16
EN
PipeDream: Turning Pipeline Parallelism into a Practical Training System — Deep Technical Review
04-16
中
PipeDream:把 Pipeline Parallelism 做成真正可训练系统——深度阅读笔记
04-09
EN
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving — Deep Technical Review
04-09
中
DistServe:通过 Prefill/Decoding 解耦实现面向 Goodput 的大模型服务优化 — 深度阅读笔记
04-07
EN
ORPO: Monolithic Preference Optimization without Reference Model — In-Depth Technical Review
04-07
中
ORPO:不用参考模型的一体化偏好优化 — 深度阅读笔记
03-27
EN
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection — In-Depth Technical Review
03-26
EN
Alpa: Automating Inter- and Intra-Operator Parallelism — In-Depth Technical Review
03-19
EN
ZeRO: Shattering the Memory Wall — How DeepSpeed Trains Trillion-Parameter Models
03-12
EN
Megatron-LM: NVIDIA's Blueprint for Training Billion-Parameter Models at Scale
03-12
EN
PaRO: Smarter Partitioning for Distributed Training — Beyond ZeRO's One-Size-Fits-All
02-19
EN
vLLM and PagedAttention: Efficient Memory Management for Large Language Model Serving — Technical Review
02-18
EN
GLM-5 Technical Review: From Vibe Coding to Agentic Engineering
2023
04-29
EN
ComputerArchitecture-Day1
2019
11-26
EN
Tensorflow-Day1-DNN Explain
11-24
EN
Reinforcement Learning\_WatermelonBook\_Summary