Page 7 / 10
116 posts in total. Keep on posting.
Showing posts 73–84 of 116. Each entry opens locally on this site; legacy Hexo posts link back to their original article at the bottom for reference.
2026
- EN
Speculative Decoding: Making LLM Inference 2-3× Faster Without Losing a Single Token
A technical review of Speculative Decoding, analyzing how using a smaller draft model to propose tokens and a larger model to verify them achieves 2-3x inference speedup with mathematically identical output distributions.
- EN
InstructGPT: The RLHF Recipe That Turned GPT-3 Into a Helpful Assistant
A detailed technical review of InstructGPT (OpenAI, 2022), analyzing how Reinforcement Learning from Human Feedback (RLHF) with a three-stage pipeline — SFT, reward modeling, and PPO — transformed next-token prediction into instruction-following behavior that aligned with human intent.
- EN
AutoGen: Microsoft's Framework for Building Multi-Agent Conversations That Actually Work
A technical review of AutoGen, examining how multi-agent conversation frameworks with customizable agents enable complex LLM applications through cooperative dialogue patterns.
- EN
Generative Agents: 25 AI Characters Living in a Simulated Town — Believable Human Behavior from LLMs
A technical review of Generative Agents, analyzing how LLM-powered agents with memory, reflection, and planning create believable simulations of human behavior in interactive sandbox environments.
- EN
SWE-agent: Turning LLMs Into Autonomous Software Engineers That Fix Real GitHub Issues
A technical review of SWE-agent, analyzing how Agent-Computer Interface (ACI) design principles enable LLM agents to autonomously resolve real-world GitHub issues.
- EN
Self-Refine: Teaching LLMs to Critique and Improve Their Own Output — No Extra Training Needed
A technical review of Self-Refine, analyzing how iterative self-feedback loops enable LLMs to progressively improve their own outputs without external training or human supervision.
- EN
DeepSeekMath: How 120B Tokens of Math Data and GRPO Rival GPT-4 on Competition Problems
A detailed technical review of DeepSeekMath, analyzing how continued pretraining on math corpora combined with Group Relative Policy Optimization (GRPO) enables a 7B open model to rival frontier systems on challenging math benchmarks.
- EN
Reflexion: LLM Agents That Learn from Failure Through Verbal Self-Reflection
A technical review of Reflexion, exploring how language agents use verbal self-reflection as reinforcement signals to iteratively improve performance on coding, reasoning, and decision-making tasks.
- EN
vLLM and PagedAttention: Efficient Memory Management for Large Language Model Serving — Technical Review
A detailed technical review of the vLLM paper, which introduces PagedAttention—a novel attention algorithm inspired by OS virtual memory paging—to eliminate KV cache memory waste and dramatically increase LLM serving throughput.
- EN
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning — Technical Review
A detailed technical review of the AdaLoRA paper, which proposes adaptive budget allocation for parameter-efficient fine-tuning by parameterizing weight updates via SVD and dynamically pruning singular values based on importance scoring.
- EN
GLM-5 Technical Review: From Vibe Coding to Agentic Engineering
A detailed technical review of GLM-5's agentic engineering approach, analyzing how Zhipu AI moved from 'vibe coding' to systematic agent-driven software development with real-world deployment insights and lessons learned.
- EN
DeepSeek-V2: Multi-head Latent Attention and DeepSeekMoE — Technical Review
A detailed technical review of the DeepSeek-V2 architecture, focusing on Multi-head Latent Attention (MLA) which achieves 93.3% KV cache reduction through low-rank key-value joint compression, and DeepSeekMoE which enables economical training through fine-grained expert segmentation and shared expert isolation.