Zhongzhu / Charlie
Home
Research
Publication
Experience
Recent News
Blog
CV
↗
Tag
#
Speculative Decoding
13 posts tagged with this label. Back to
all tags
or the
main feed
.
2026
05-17
EN
PipeSD: Cloud-Edge Collaborative Pipeline Inference with Speculative Decoding — Technical Review
05-17
中
PipeSD:基于推测解码的云边协同流水线推理框架 —— 阅读笔记
05-16
EN
An Interpretable Latency Model for Speculative Decoding in LLM Serving — Technical Review
05-16
中
用 Little 定律解释推测解码在真实服务中的提速曲线 —— 阅读笔记
05-14
EN
DisagMoE: Disaggregating Attention and FFN to Beat the MoE All-to-All Bottleneck
05-14
中
DisagMoE:用解耦 Attention 和 FFN 打通 MoE 训练的 all-to-all 瓶颈
05-10
EN
Tutti: Making SSD-Backed KV Cache Practical for Long-Context LLM Serving
05-10
中
Tutti:让基于 SSD 的 KV Cache 真正适用于长上下文 LLM Serving
04-19
EN
SpecGuard: Verification-Aware Speculative Decoding for Efficient Multi-Step Reasoning
04-19
中
SpecGuard:用于多步推理的验证感知推测解码
04-15
EN
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding — Deep Technical Review
04-15
中
LayerSkip:让大模型“提前退出 + 自校验推理”成为可部署方案——深度阅读笔记
03-11
EN
Speculative Decoding: Making LLM Inference 2-3× Faster Without Losing a Single Token