Tag

#Model Compression

19 posts tagged with this label. Back to all tags or the main feed.

2026

06-26 EN

SigmaScale: Learning to Scale Weight Matrices for Better SVD-Based LLM Compression
06-26 中

SigmaScale 阅读笔记：通过学习缩放矩阵改进 SVD 大语言模型压缩
06-19 EN

LASER: How Throwing Away 99% of a Weight Matrix Can Make LLMs Smarter
06-19 中

LASER：丢掉 99% 的矩阵秩，LLM 推理准确率反而提高了 27%
06-12 EN

SliceGPT: Post-Training LLM Compression via Computational Invariance
06-12 中

SliceGPT 阅读笔记：用计算不变性删除 Transformer 的行与列
05-29 EN

IO-SVD: Input-Output Whitened SVD for Adaptive-Rank LLM Compression
05-29 中

IO-SVD：基于输入输出双侧白化的自适应秩LLM压缩方法
05-15 EN

Zero Sum SVD: A Global, Loss-Aware Rank Budget for LLM Compression
05-15 中

Zero Sum SVD：用「损失零和」做全局奇异值预算分配的 LLM 压缩方法
05-08 EN

Swift-SVD: Activation-Aware Low-Rank Compression for LLM Weights and KV Cache
04-17 EN

GRASP Technical Review: Replacing Redundant LLM Layers with Adaptive Singular Parameters
04-10 EN

SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression — Deep Technical Review
04-10 中

SVD-LLM：面向大语言模型压缩的“截断感知”奇异值分解方法 — 深度阅读笔记
04-03 EN

AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration — In-Depth Technical Review
04-03 中

AWQ：感知激活值的大模型权重量化压缩与加速 — 深度阅读笔记
04-01 EN

Layer Pruning for Efficient Large Language Models — In-Depth Technical Review
03-25 EN

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers — In-Depth Technical Review
03-21 EN

BitNet: Scaling 1-bit Transformers for Large Language Models — In-Depth Technical Review