Recent News
Recent News
All updates in reverse chronological order.
Event · 12 Paper · 12 Project · 8 Talk · 3 Service · 7 Intern · 3 Full Time · 1 Honor · 6
- 2026/06/19 Project OSCAR has been covered by multiple tech media outlets, including Towards AI, ModelScope, MarkTechPost, QbitAI (量子位), and Synced (机器之心), reaching 100,000+ reads across platforms. Grateful for the community interest in deployable 2-bit KV-cache quantization.
- 2026/06/18 Project Released Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation. The codebase distills Qwen/Llama softmax-attention Transformers into hybrid linear-attention students built on GatedDeltaNet, using Taylor-series-informed initialization before staged distillation.
- 2026/06/17 Service Nominated as a reviewer for The 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026). Looking forward to contributing to the NLP and efficient language-model community.
- 2026/06/15 Paper Our paper OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization has been accepted by ACL SELVA 2026!
- 2026/06/10 Project Huge thanks to the open-source community — OSCAR has reached 500 stars on GitHub and now supports local llama.cpp usage, making 2-bit KV-cache quantization easier to try on local LLM deployments.
- 2026/05/29 Service Excited to help organize the 1st International Workshop on Sustainable and Efficient Language, Vision, and Action Models (SELVA 2026) at ACL 2026 — joining as a Program Committee member. SELVA brings together researchers working on efficient foundation models, multimodal systems, model compression, efficient inference, embodied AI, and related topics. Looking forward to reading the submissions!
- 2026/05/29 Talk I will be giving a talk at the AI-tonomy Summit (Models & Agents) on June 5 at the Plug and Play Tech Center in Silicon Valley — speaking in the AI Researcher Forum on Frontier Research: Scaling Agentic RL & Verified Reasoning. Looking forward to connecting with everyone building the next phase of agentic systems!
- 2026/05/26 Project Huge thanks to the open-source community — OSCAR has crossed 100 ⭐ on GitHub in its first week! Grateful to everyone who tried the rotation zoo, filed issues, or shared the project. The 2-bit KV-cache work is just getting started; more SGLang integration and additional reasoning-model recipes coming.
- 2026/05/19 Project Thrilled to release OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization. INT2 KV-cache serving at 2.28 effective bits per element — ~8× KV-memory reduction and up to ~7× higher large-batch throughput with near-BF16 accuracy on Qwen3-4B/8B/32B and GLM-4.7-FP8.
- 2026/05/18
- 2026/04/03 Event I will be attending the International Conference on Learning Representations (ICLR 2026) in Rio. Looking forward to connecting with amazing researchers — see you in Rio!
- 2026/02/25
- 2026/02/25
- 2026/02/04
- 2026/01/30 Event I will be attending HPCA / PPoPP / CGO / CC 2026 as the web chair of HPCA 2026. Happy to see everyone there.
- 2026/01/27 Paper Our paper KITTY: Accurate and Efficient 2-bit KV Cache Quantization with Dynamic Channel-wise Precision Boost has been accepted by MLSys 2026! Congrats to Haojun!
- 2026/01/27 Paper Our paper CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention has been accepted by ICLR 2026!
- 2025/11/13 Talk I will join a panel discussion on AI in the Cross-Border Digital Technologies Ecosystem: Infrastructure, Platforms & Global Launchpads.
- 2025/10/01 Service Joined 32nd IEEE International Symposium on High-Performance Computer Architecture (HPCA 2026) organization as Web Chair.
- 2025/09/02 Paper My first theoretical work on RL has been released on arXiv. Feel free to discuss!
- 2025/07/18 Full Time I will join Together Computer as a Senior AI Researcher on January 12, 2026.
- 2025/07/01 Service Nominated as Reviewer for Thirty-ninth Conference on Neural Information Processing Systems (NeurIPS 2025).
- 2025/06/15 Honor I passed my Progress Evaluation Meeting as satisfactory or excellent — thank you USYD! Preparing for my thesis!
- 2025/05/02 Paper Our paper Ladder-Residual has been accepted by ICML 2025. Congrats to Muru. See you in Vancouver, Canada!
- 2025/04/24 Event I will be attending the International Conference on Learning Representations (ICLR 2025) in Singapore. Looking forward to connecting with amazing researchers — see you in Singapore!
- 2025/03/16 Event I will be attending the NVIDIA GPU Technology Conference (GTC 2025) in San Jose, California. Excited to meet fellow GPU enthusiasts — see you there!
- 2024/12/07 Paper Our survey, RenAIssance, is finally accepted by TPAMI! Congratulations to Bobby!
- 2024/09/26 Paper Our paper, CorDA, has been accepted by NeurIPS 2024.
- 2024/07/10 Event I will be attending the USENIX Annual Technical Conference (ATC 2024) online. Looking forward to connecting!
- 2024/06/15 Honor I passed my Progress Evaluation Meeting as satisfactory or excellent — thank you USYD!
- 2024/06/06 Paper Our paper, Quant-LLM, has been accepted by USENIX ATC 2024. Congratulations to Haojun again!
- 2024/05/01 Intern Joined Together Computer as a Research Consultant.
- 2024/03/01 Intern Joined Dolby as a Research Intern.
- 2023/11/22 Honor Received APR Intern Program Scholarship (SC3600). Thank you USYD and Dolby!
- 2023/10/06 Paper DeepSpeed4Science has been released! Hope to participate in more AI4Science projects!
- 2023/07/13 Service Joined ACM International Conference on Architectural Support for Programming Languages and Operating Systems Artifact Evaluation Committee (ASPLOS'24 AEC).
- 2023/06/20 Paper Our paper, Flash-LLM, has been accepted by VLDB 2024. Congratulations to Haojun!
- 2023/06/15 Honor I passed my Progress Evaluation Meeting as satisfactory or excellent — thank you Dolby and USYD!
- 2023/04/04 Paper The DeepSpeed Chat project I participated in has reached #11 on Zhihu and #7 on GitHub this week — DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales.
- 2023/04/03 Service Nominated as Reviewer for Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023).
- 2023/03/25 Event I will be attending the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2023) online. Excited to see you there!
- 2023/03/01 Intern Joined DeepSpeed Team, Microsoft as a Research Intern.
- 2023/03/01 Service Joined Computer Science Research Methods (CSRM 2023) — INFO5993 / INFO4990 at the University of Sydney — Program Committee.
- 2022/11/13 Event I will be attending the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2022) online. Looking forward to engaging with the global HPC community!
- 2022/10/01 Honor Received The Jingdong Technology (JD) Co Ltd Research Scholarship in Artificial Intelligence. Thank you JD and USYD!
- 2021/12/16 Event I will be attending the China National Computer Congress (CNCC 2021) online.
- 2021/11/14 Event I will be attending the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2021) online. See you online at SC — always an inspiring HPC gathering!
- 2021/07/01 Honor Received SYSU Overseas Visiting and Collaborative Research Program Funding Plan. Thank you SYSU!
- 2021/06/24 Event I will be attending the ACM International Conference on Supercomputing (ISC 2021) online. Looking forward to exchanging ideas with HPC experts worldwide!
- 2021/06/14 Event I will be attending the International Symposium on Computer Architecture (ISCA 2021) online.
- 2020/07/11 Talk I will give a talk on JSidentify at ICSE 2020. Looking forward to connecting!
- 2020/05/23 Event I will be attending the International Conference on High Performance Big Data and Intelligent Systems (HPBD&IS 2020) online.