Tag

#Speculative Decoding

17 posts tagged with this label. Back to all tags or the main feed.

2026

07-01 EN

SSV: Sparse Speculative Verification for Efficient LLM Inference
07-01 中

SSV：稀疏投机验证——在动态稀疏注意力中做投机解码
06-27 EN

JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting
06-27 中

JetSpec：用并行树草稿突破推测解码的扩展上限
05-17 EN

PipeSD: Cloud-Edge Collaborative Pipeline Inference with Speculative Decoding — Technical Review
05-17 中

PipeSD：基于推测解码的云边协同流水线推理框架 —— 阅读笔记
05-16 EN

An Interpretable Latency Model for Speculative Decoding in LLM Serving — Technical Review
05-16 中

用 Little 定律解释推测解码在真实服务中的提速曲线 —— 阅读笔记
05-14 EN

DisagMoE: Disaggregating Attention and FFN to Beat the MoE All-to-All Bottleneck
05-14 中

DisagMoE：用解耦 Attention 和 FFN 打通 MoE 训练的 all-to-all 瓶颈
05-10 EN

Tutti: Making SSD-Backed KV Cache Practical for Long-Context LLM Serving
05-10 中

Tutti：让基于 SSD 的 KV Cache 真正适用于长上下文 LLM Serving
04-19 EN

SpecGuard: Verification-Aware Speculative Decoding for Efficient Multi-Step Reasoning
04-19 中

SpecGuard：用于多步推理的验证感知推测解码
04-15 EN

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding — Deep Technical Review
04-15 中

LayerSkip：让大模型“提前退出 + 自校验推理”成为可部署方案——深度阅读笔记
03-11 EN

Speculative Decoding: Making LLM Inference 2-3× Faster Without Losing a Single Token