Experience & Projects
Professional Experience & Industry Research Projects
Future System Architecture (FSA) Lab, The University of Sydney (USYD) Mar. 2022 - Present
Visiting Scholar, Ph.D. student Sydney, Australia
Advisor: Shuaiwen Song (Associate Professor, USYD), Chang Xu (Associate Professor, USYD), Yibo Yang (Research Scientist in JD Explore Academy)
Research Projects:
RenAIssance: A survey into AI text to image generation in the era of large models
Motivation: Text-to-image synthesis has become increasingly popular in the AI and computer graphics world (AIGC). However, there is no comprehensive survey paper that systematically introduces the frameworks and ideas behind text-to-image techniques. We aim to fill this gap in the literature.
Contributions:
Read over 100 papers, providing a literature review for each.
Collaborated with lab classmates to write the comprehensive survey paper.
Optimization of Diffusion Model Denoising Process
Motivation: Diffusion models currently require a large number of denoising steps, which we aim to reduce. One reason for the lengthy process is the lack of a clear relationship between the noise and the trained image. Our goal is to explore additional methods to establish a connection between noise and the denoised image, beyond guidance techniques, such as incorporating text embeddings into the raw noise.
Contributions:
Develop innovative ideas, implement them, and conduct comparative experiments to evaluate their performance.
Exploring Neural Collapse Phenomenon in Reinforcement Learning
Motivation: In reinforcement learning, agents may exhibit biased action selection in the environment due to incomplete understanding of the state and action distribution spaces. This research investigates whether the neural collapse phenomenon occurs in policy gradient networks as agents train with sufficient examples and examines its implications for balancing action selection in reinforcement learning agents.
Contributions:
Conducted experiments applying ETF classifiers to 5+ neural networks in 10+ discrete-action reinforcement learning environments (e.g., Atari, Gym Classic)
Derived and proved the formula and geometric properties of policy gradient loss function
Authored paper drafts and submitted to the NeurIPS conference
Sparse Kernel Design in GPU TensorCore
Motivation: With the application of pruning methods, neural network weight matrices become increasingly sparse, but there is no implementation for sparse kernels in GPU TensorCore.
Contributions:
1. Conducted comparative experiments between our sparse kernel and Google’s Sputnik.
2. Summarized experiment results and figures in the paper.
DeepSpeed I/O Framework Support for AI4Science
Motivation: AI4Science models have revolutionized the AI world. DeepSpeed can support AI4Science models deployed across multiple nodes but lacks an I/O management framework for handling large amounts of training data efficiently.
Contributions:
Investigated DeepSpeed I/O support in supercomputers (Argonne HDF5 Luster System), analyzed data shuffling and fetching patterns for AI4Science models powered by DeepSpeed, and implemented algorithms to accelerate I/O.
Implemented a ViT model for weather prediction.
Industry Projects:
DeepSpeed Chat: Easy, Fast, and Affordable RLHF Training of ChatGPT-like Models at All Scales (Microsoft DeepSpeed Team)
Motivation: ChatGPT-like models have revolutionized the AI world, but an accessible end-to-end RLHF pipeline for training powerful ChatGPT-like models is still lacking within the AI community.
Contributions:
Investigated ColossalAI’s pipeline, learned how to use ColossalAI’s Zero-2, 3, and GeminiDDP, and adapted them for our RLHF algorithm.
Ran 400+ benchmark experiments for DeepSpeed Chat, ColossalAI, and HuggingFace powered by native PyTorch. Summarized the results and conclusions in the DeepSpeed blog.
Revised DeepSpeed GitHub Landing Page, DeepSpeed Chat Blog, and produced DeepSpeed Chat video.
School of Computer Science and Engineering, SYSU Sep. 2018 - Mar. 2022
Research Assistant Guangzhou, China
Advisor: Dan Huang (Associate Professor, SYSU), Yunfei Du, Yutong Lu (Professor, SYSU)
Research Projects:
Pre-Expedite: Use Hierarchical Structure Space for Improving the Performance of Accessing Small Files in Parallel File System - Undergraduate Thesis
Motivation: Implemented an approach to reduce clients’ I/O communication with MDS, leveraging minimal additional client-side resources. Ensured high usability without modifying POSIX standards.
Contributions:
Investigated the I/O bottleneck in parallel/distributed file systems for Big Data and Artificial Intelligence applications, identifying intensive metadata communication with the metadata server as a primary issue.
Utilized POSIX to create ZERO file blocks (Loop Device). Established a VFS within the ZERO file blocks, allowing each user to store small files in their designated ZERO file blocks.
HybridShare: Universal Resource Scheduling for Hybrid Jobs
Motivation: CPU- and GPU-centric applications allocate resources exclusively, leading to inefficient utilization of heterogeneous resources.
Contributions:
Analyzed the possibility of co-locating modern workflow - application in the same physical machine to share resources.
Proposed HybridShare algorithms that can enable different resources-prefer jobs to be co-located in the same node and share hardware resources (e.g., GPU-concentric, CPU-concentric, Mem-intensive) through Slurm, Mesos, Kubernetes.
MAEM - Multiple Applications co-Execution time Estimation
Motivation: There are few works to accurately estimate the slowdown of CPU/GPU applications based on the characteristic of applications & hardware architecture
Contribution:
Conducted a literature review on application profiling, interference and slowdown estimation, and interference-aware scheduling.
Gathered resource consumption data for various benchmarks and analyzed their behavior.
Institute of Advanced Networks and Computing Systems, SYSU Oct. 2018 - Mar. 2019
Research Intern Guangzhou, China
Advisor: Hejun Wu (Associate Professor, SYSU)
Research Projects:
EmReal: A Digital Twin Framework of Emulated and Real Components for Robots with Reinforcement Learning
Motivation: Pioneered a digital twin framework for robots utilizing reinforcement learning (RL), bridging the gap between simulations and real-world deployments. Developed solutions to effectively transition RL algorithms from simulators to actual robots, advancing the field beyond its nascent stage.
Contributions:
Conducted a survey on robotics simulator systems and reinforcement learning algorithms.
Designed and implemented a one-legged robot, integrating real and emulated components using XLM, Python, ROS, and Arduino C programming.
Created a digital twin framework for robotic systems, employing reinforcement learning (RL) and seamlessly blending emulation, pre-training, connectivity, and hardware adaptation using ROS and PyBullet.
Co-authored a book on deep learning in reinforcement learning, awaiting publication.
Tencent Holdings Ltd. Weixin Group & Dep. of CS UIUC Jul. 2018 - Jul. 2020
Research Intern, Testing, Technical-Architecture Department Champaign, IL, US & Guangzhou, China
Advisor: Tao Xie (Professor and Willett Faculty Scholar, UIUC), Yuetang Deng (Director)
Industry Projects:
JSidentify: A Hybrid Framework for Detecting Plagiarism Among JavaScript Code in Online Mini Games
Motivation: In cases of plagiarism for mini-games, deeply obfuscated code cloned from the original code often embodies malicious code segments and copyright infringements, posing great challenges for existing plagiarism detection tools. To address these challenges, we design and implement JSidentify, a hybrid framework to detect plagiarism among online mini games.
Contributions:
Worked under the guidance of Prof. Tao Xie, focusing on intermediate representation analysis in V8 Node.js’s Interpreter.
Conducted literature review on code plagiarism detection methods and evaluations of clone detection tools.
Developed an edit distance estimation and network flow algorithm to measure similarity in bytecode generated by Ignition, TurboFan Interpreter.
Designed a priority-queue-based framework to consolidate multiple plagiarism detection algorithms.
Co-authored a paper titled ”JSidentify: A Hybrid Framework for Detecting Plagiarism Among JavaScript Code in Online Mini Games.”
Microsoft(China) Co.,Ltd. Guangzhou Branch Sep. 2018 - Feb. 2019
Project Assistant to Senior Cloud Acrchitect Guangzhou, China
Advisor: Zhen Guan (Sr.Partner Technology Strategist, Microsoft)
Gained proficiency in Azure’s architecture and utilized Azure for training multiple machine learning models.
Developed a textile-focused Q&A system to address a market gap in China:
Collected Q&A data by crawling prominent domestic textile websites.
Preprocessed data through cleaning, serializing, and tokenizing text into a corpus. ∗ Implemented a pre-trained BERT model for the Q&A system.
Deployed the BERT model on Azure as a service.
SYSU-CMU Joint Institute of Engineering (JIE) Feb. 2017 - Aug. 2017
Research & Software Engineer Intern Guangzhou, China
Advisor: Xiaoyin Tang (Professor, Southern University of Science and Technology)
Created a front-end website to integrate with a back-end deep learning model for efficient analysis of numerous fundus photographs.
Enabled detection of diabetic retinopathy (DR) and diabetic macular edema (DME) through seamless collaboration between the front-end and back-end systems.
Computational Medical Imaging Laboratory, SYSU Jul. 2016 - Aug. 2017
Research Intern Guangzhou, China
Advisor: Yao Lu (Professor, SYSU)
Collected breast cancer data through web crawling Scrapy.
Developed an OHIF Viewer web project, available at LINK.
Hosted a SIT (College Students’ Innovative Entrepreneurial Training Plan), ID: 201502059. – Implemented traditional image processing algorithms on mobile platforms.
Other Projects
LeetCode Record Jun. 2017 - Present
Honing Programming Skills Daily
Utilized languages such as C, CPP, Python3, Java, and Go to solve LeetCode algorithm questions based on my preference.
Maintained a repository containing my code and insights for each LeetCode problem.
System Related Conference Papers Crawler Jun. 2021 - Present
Web Scraper and Timeline for Top-tier Systems Conference
Leveraged Python, BeautifulSoup4, and Requests to scrape papers and crucial deadlines for major computer system conferences
Employed Pandas and Matplotlib to create a timeline representing significant computer system paper submission deadlines.
DDLs Dec. 2017 - May. 2018
Course Project: Design and Development of Android Applications Guangzhou, China
BACKEND CODE LINK FRONTEND CODE LINK
Developed DDLs, an Android application for personal deadline management, using Java and Android Studio for the front-end, incorporating MVC architecture, and NodeJS with Express.js for the back-end RESTful API.
Implemented features such as deadline administration with CRUD operations, adding, completing, and deleting deadlines in a timeline using SQLite for local storage, marking completed deadlines as unfinished, receiving server notifications through WebSocket, sharing timeline screenshots using Android’s native sharing capabilities, and user authentication with JSON Web Tokens (JWT) for registration and login functionality.
ChainLoveHelp May. 2018 - May. 2018
South China Microsoft Hackathon Competition Guangzhou, China
ChainLoveHelp is dedicated to providing a peer-to-peer platform for university task posting and processing based on blockchain technology.
For the chain-end, employed Ethereum-based Parity to construct a consortium blockchain, operating two nodes on the chain for transaction processing, accounting, and consensus.
For the front-end, implemented a robust technology stack using PHP for server-side scripting, Apache as the web server, and MySQL for database management.
GuangTu Apr. 2017 - May. 2017
South China Microsoft Hackathon Competition Guangzhou, China
Guangtu is a Windows-based map planning software that utilizes gesture recognition technology for enhanced user interaction.
The application was developed using Python for programming, Leap Motion for gesture recognition, PyQt5 for creating the graphical user interface, and Django for building the web framework and backend functionality.
Seven Seconds Apr. 2017 - May. 2017
SYSU Student Software Creative Design and Innovation Development Competition Guangzhou, China
Designed and developed an Android App to organize and record memories, leveraging the capabilities of Android Studio and Java. Successfully published the app on the 360 Mobile App Market.
Implemented a robust mobile App architecture, encompassing a user-friendly sidebar, homepage, memory management, as well as secure login and registration modules. Employed advanced data handling techniques, RESTful APIs, and seamless integration with a Node.js backend for efficient data processing and storage.
Seven Seconds Apr. 2017 - May. 2017
SYSU Student Software Creative Design and Innovation Development Competition Guangzhou, China
Designed and developed a Android App to organize and record memories, leveraging the capabilities of Android Studio and Java. Successfully published the app on the 360 Mobile App Market.
Implemented a robust mobile App architecture, encompassing a user-friendly sidebar, homepage, memory management, as well as secure login and registration modules. Employed advanced data handling techniques, RESTful APIs, and seamless integration with a Node.js backend for efficient data processing and storage.
PVmedtech Jul. 2016 - Aug. 2017
Advisor: Yao Lu (Professor, SYSU) Guangzhou, China
Collected breast cancer data through web crawling Scrapy.
Developed an OHIF Viewer web project, available at LINK.
Hosted a SIT (College Students’ Innovative Entrepreneurial Training Plan), ID: 201502059. – Implemented traditional image processing algorithms on mobile platforms.