Wei Yao


Hi there, welcome! I am currently a third-year Ph.D. student at the Gaoling School of Artificial Intelligence, Renmin University of China. I am truly honored to be advised by Prof. Yong Liu. From October 2023 to March 2024, as a research intern at Shanghai AI Laboratory, I was fortunate to work under the guidance of Dr. Jing Shao.

Prior to my Ph.D. studies, I earned my Bachelor of Engineering in Software Engineering from Huazhong University of Science and Technology in June 2022. I’m very fortunate to be advised by Prof. Kun He. During my undergraduate studies, I was honored to receive the National Scholarship (2019), a recognition that motivated me to pursue further research in AI.

Research Interests


My previous research focused on trustworthy AI, including fairness, robustness and interpretability: ICML25, TMLR24, ACL24, CVPR23.

I’m currently focused on superalignment, particularly the area of weak-to-strong generalization, as demonstrated by our work in ACL25 and several upcoming preprints.

Preprints


(* indicates equal contribution, # indicates corresponding authors)

On Weak-to-Strong Generalization and f-Divergence

Wei Yao*, Gengze Xu*, Huayi Tang, Wenkai Yang, Donglin Di, Ziqiao Wang, Yong Liu#
arXiv preprint arXiv:2506.03109

On the Emergence of Weak-to-Strong Generalization: A Bias-Variance Perspective

Gengze Xu*, Wei Yao*, Ziqiao Wang#, Yong Liu#
arXiv preprint arXiv:2505.24313

The Capabilities and Limitations of Weak-to-Strong Generalization: Generalization and Calibration

Wei Yao*, Wenkai Yang*, Gengze Xu, Ziqiao Wang, Yankai Lin, Yong Liu#
arXiv preprint arXiv:2502.01458

Selected Publications


(* indicates equal contribution)

Revisiting Weak-to-Strong Generalization in Theory and Practice: Reverse KL vs. Forward KL

Wei Yao*, Wenkai Yang*, Ziqiao Wang, Yankai Lin, Yong Liu#
ACL 2025 (Findings)

Understanding Model Ensemble in Transferable Adversarial Attack

Wei Yao*, Zeliang Zhang*, Huayi Tang, Yong Liu#
ICML 2025

Understanding Fairness Surrogate Functions in Algorithmic Fairness

Wei Yao*, Zhanke Zhou*, Zhicong Li, Bo Han, Yong Liu#
TMLR 2024

Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models

Chen Qian*, Jie Zhang*, Wei Yao*, Dongrui Liu, Zhenfei Yin, Yu Qiao, Yong Liu#, Jing Shao#
ACL 2024 (Findings)

Fair Scratch Tickets: Finding Fair Sparse Networks without Weight Training

Pengwei Tang*, Wei Yao*, Zhicong Li, Yong Liu#
CVPR 2023

Service


Reviewer: NeurIPS, ICLR, AISTATS, TMLR, ACL, EMNLP