Tag

#Mixture of Experts

16 posts tagged with this label. Back to all tags or the main feed.

2026

06-28 EN

Moebius: Seamless Runtime Parallelism Switching for MoE LLM Serving
06-28 中

Moebius：为 MoE 大模型推理服务实现无缝运行时并行策略切换
06-13 EN

ForeMoE: Micro-step-level MoE Load Balancing for RL Post-training via Routing Foresight
06-13 中

ForeMoE：利用路由预见性实现 RL 后训练中 MoE 微步级负载均衡
05-16 EN

An Interpretable Latency Model for Speculative Decoding in LLM Serving — Technical Review
05-16 中

用 Little 定律解释推测解码在真实服务中的提速曲线 —— 阅读笔记
05-14 EN

DisagMoE: Disaggregating Attention and FFN to Beat the MoE All-to-All Bottleneck
05-14 中

DisagMoE：用解耦 Attention 和 FFN 打通 MoE 训练的 all-to-all 瓶颈
05-07 EN

Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism
04-29 EN

FEPLB Technical Review: Nearly Free MoE Load Balancing with the NVLink Copy Engine
04-24 EN

FEPLB: Zero-Cost MoE Load Balancing via NVLink Copy Engine
04-14 EN

Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts — Deep Technical Review
04-14 中

ArmoRM：用“多目标奖励建模 + 混合专家门控”做可解释偏好学习——深度阅读笔记
04-04 EN

Switch Transformers: Scaling to Trillion-Parameter Sparse Models — In-Depth Technical Review
04-04 中

Switch Transformers：用简单高效的稀疏性扩展到万亿参数模型 — 深度阅读笔记
02-18 EN

DeepSeek-V2: Multi-head Latent Attention and DeepSeekMoE — Technical Review