Wei Yao
Hi there, welcome! I am currently a third-year Ph.D. student at the Gaoling School of Artificial Intelligence, Renmin University of China. I am truly honored to be advised by Prof. Yong Liu. From October 2023 to March 2024, as a research intern at Shanghai AI Laboratory, I was fortunate to work under the guidance of Dr. Jing Shao.
Prior to my Ph.D. studies, I earned my Bachelor of Engineering in Software Engineering from Huazhong University of Science and Technology in June 2022. I’m very fortunate to be advised by Prof. Kun He. During my undergraduate studies, I was honored to receive the National Scholarship (2019), a recognition that motivated me to pursue further research in AI.
Research Interests
My previous research focused on trustworthy AI, including fairness, robustness and interpretability: ICML25, TMLR24, ACL24, CVPR23.
I’m currently focused on superalignment, particularly the area of weak-to-strong generalization, as demonstrated by our work in ACL25 and several upcoming preprints.
Preprints
(* indicates equal contribution, # indicates corresponding authors)
On Weak-to-Strong Generalization and f-Divergence
Wei Yao*, Gengze Xu*, Huayi Tang, Wenkai Yang, Donglin Di, Ziqiao Wang, Yong Liu#
arXiv preprint arXiv:2506.03109
On the Emergence of Weak-to-Strong Generalization: A Bias-Variance Perspective
Gengze Xu*, Wei Yao*, Ziqiao Wang#, Yong Liu#
arXiv preprint arXiv:2505.24313
The Capabilities and Limitations of Weak-to-Strong Generalization: Generalization and Calibration
Wei Yao*, Wenkai Yang*, Gengze Xu, Ziqiao Wang, Yankai Lin, Yong Liu#
arXiv preprint arXiv:2502.01458
Selected Publications
(* indicates equal contribution)
Revisiting Weak-to-Strong Generalization in Theory and Practice: Reverse KL vs. Forward KL
Wei Yao*, Wenkai Yang*, Ziqiao Wang, Yankai Lin, Yong Liu#
ACL 2025 (Findings)
Understanding Model Ensemble in Transferable Adversarial Attack
Wei Yao*, Zeliang Zhang*, Huayi Tang, Yong Liu#
ICML 2025
Understanding Fairness Surrogate Functions in Algorithmic Fairness
Wei Yao*, Zhanke Zhou*, Zhicong Li, Bo Han, Yong Liu#
TMLR 2024
Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models
Chen Qian*, Jie Zhang*, Wei Yao*, Dongrui Liu, Zhenfei Yin, Yu Qiao, Yong Liu#, Jing Shao#
ACL 2024 (Findings)
Fair Scratch Tickets: Finding Fair Sparse Networks without Weight Training
Pengwei Tang*, Wei Yao*, Zhicong Li, Yong Liu#
CVPR 2023
Service
Reviewer: NeurIPS, ICLR, AISTATS, TMLR, ACL, EMNLP