Homepage - Zhen Qin

Zhen Qin（秦臻）

Zhejiang University

I am a 4th-year Ph.D. Student at College of Computer Science and Technology, Zhejiang University, China. Currently, I mainly focus on 1) Data Provision for and with LLMs, and 2) Federated Learning and Federated Fine-tuning of LLMs. So far, I have published more than 10 papers in IEEE TPAMI, IEEE TSC, ICML, KDD, WWW, AAAI, ACL, ICDE, etc.

zhenqincn@gmail.com Google Scholar GitHub

Education

Zhejiang University

College of Computer Science and Technology
Ph.D. Student

Sep. 2021 - present
Shanghai University

M.S. in Computer Science

Sep. 2018 - Jul. 2021
Shanghai University

B.S. in Computer Science

Sep. 2014 - Jul. 2018

Honors & Awards

Excellent Graduate of Zhejiang Province

2025
National Scholarship for Doctoral Students (in Zhejiang University)

2024
The Best Research Award in Data Science of Zhejiang University

2024
Excellent Graduate of Shanghai

2021
National Scholarship for Master's Students (in Shanghai University)

2020
National Scholarship for Master's Students (in Shanghai University)

2019

News

2025

One coauthored paper was accepted by IEEE TMC.

Jul 19

One paper as corresponding author was accepted by ACM MM 2025.

Jul 06

One survey paper was accpeted by IEEE TPAMI Featured

Jun 01

2024

Two papers were accepted by ACL 2025.

May 15

Selected Publications (view all )

The Synergy Between Data and Multi-Modal Large Language Models: A Survey From Co-Development Perspective

Zhen Qin†, Daoyuan Chen†, Wenhao Zhang, Liuyi Yao, Yilun Huang, Bolin Ding, Yaliang Li^*, Shuiguang Deng^* († equal contribution, ^* corresponding author)

IEEE Transactions on Pattern Analysis and Machine Intelligence 2025

Recent years have witnessed the rapid development of large language models (LLMs). Mmodal LLMs (MLLMs) extend modality from text to various domains, attracting widespread attention due to their diverse application scenarios. As LLMs and MLLMs rely on vast amounts of model parameters and data to achieve emergent capabilities, the importance of data is gaining increasing recognition. Reviewing recent data-driven works for MLLMs, we find that the development of models and data is not two separate paths but rather interconnected. Vaster and higher-quality data improve MLLM performance, while MLLMs, in turn, facilitate the development of data. The co-development of modal data and MLLMs requires a clear view of 1) at which development stages of MLLMs specific data-centric approaches can be employed to enhance certain MLLM capabilities, and 2) how MLLMs, using these capabilities, can contribute to mmodal data in specific roles. To promote data-model co-development for MLLM communities, we systematically review existing works on MLLMs from the data-model co-development perspective.

[Paper] [Project]

The Synergy Between Data and Multi-Modal Large Language Models: A Survey From Co-Development Perspective

Zhen Qin†, Daoyuan Chen†, Wenhao Zhang, Liuyi Yao, Yilun Huang, Bolin Ding, Yaliang Li^*, Shuiguang Deng^* († equal contribution, ^* corresponding author)

IEEE Transactions on Pattern Analysis and Machine Intelligence 2025

[Paper] [Project]

Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes

Zhen Qin, Daoyuan Chen, Bingchen Qian, Bolin Ding, Yaliang Li^*, Shuiguang Deng^* (^* corresponding author)

International Conference on Machine Learning (ICML) 2024

Pre-trained large language models (LLMs) require fine-tuning to improve their responsiveness to natural language instructions. Federated learning (FL) offers a way to perform fine-tuning using the abundant data on end devices without compromising data privacy. Most existing federated fine-tuning methods for LLMs rely on parameter-efficient fine-tuning techniques, which may not reach the performance heights possible with full-parameter tuning. However, the communication overhead associated with full-parameter tuning is prohibitively high for both servers and clients. This work introduces FedKSeed, a novel approach that employs zeroth-order optimization (ZOO) with a set of random seeds. It enables federated full-parameter tuning of billion-sized LLMs directly on devices. Our method significantly reduces transmission requirements between the server and clients to just a few scalar gradients and random seeds, amounting to only a few thousand bytes. Building on this, we develop a strategy to assess the significance of ZOO perturbations for FL, allowing for probability-differentiated seed sampling. This prioritizes perturbations that have a greater impact on model accuracy. Experiments across six scenarios with different LLMs, datasets and data partitions demonstrate that our approach outperforms existing federated LLM fine-tuning methods in terms of both communication efficiency and new task generalization.

[Paper] [Project]

Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes

Zhen Qin, Daoyuan Chen, Bingchen Qian, Bolin Ding, Yaliang Li^*, Shuiguang Deng^* (^* corresponding author)

International Conference on Machine Learning (ICML) 2024

[Paper] [Project]

Warning

Action required

Education

Honors & Awards

News

Selected Publications (view all )

The Synergy Between Data and Multi-Modal Large Language Models: A Survey From Co-Development Perspective

The Synergy Between Data and Multi-Modal Large Language Models: A Survey From Co-Development Perspective

Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes

Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes

All publications