Yihan Wang

Hi, I am currently working at Mistral AI as a research scientist. I completed my Ph.D. in Computer Science at UCLA, fortunately advised by Prof. Cho-Jui Hsieh. I completed my B.Eng. degree at Tsinghua University in June 2020.

My PhD research focuses on the robustness and generalization of machine learning models, with a recent emphasis on large language models. I work on projects that both interest me and benefit society. I was supported by an Amazon Fellowship during my Ph.D. study.

I’ve also worked on formal verification of neural networks/machine learning models in my early PhD years.

selected publications

* indicates equal contribution.

Arxiv Preprint

On the loss of context-awareness in general instruction fine-tuning

Yihan Wang*, Andrew Bai* , Nanyun Peng , and Cho-Jui Hsieh

2024

URL
ACL Findings 2024

Defending LLMs against Jailbreaking Attacks via Backtranslation

Yihan Wang*, Zhouxing Shi* , Andrew Bai , and Cho-Jui Hsieh

ACL Findings, 2024

URL Code
ICLR 2024

Two-stage LLM Fine-tuning with Less Specialization and More Generalization

Yihan Wang, Si Si , Daliang Li , Michal Lukasik , Felix Yu , Cho-Jui Hsieh , Inderjit S Dhillon , and Sanjiv Kumar

In The Twelfth International Conference on Learning Representations , 2024

URL
NeurIPS 2023

Universality and limitations of prompt tuning

Yihan Wang, Jatin Chauhan , Wei Wang , and Cho-Jui Hsieh

Advances in Neural Information Processing Systems, 2023

URL
TACL

Red teaming language model detectors with language models

Zhouxing Shi* , Yihan Wang*, Fan Yin* , Xiangning Chen , Kai-Wei Chang , and Cho-Jui Hsieh

Transactions of the Association for Computational Linguistics, 2023

URL Code
NeurIPS 2021

Fast certified robust training with short warmup

Zhouxing Shi* , Yihan Wang*, Huan Zhang , Jinfeng Yi , and Cho-Jui Hsieh

Advances in Neural Information Processing Systems, 2021

URL Code
NeurIPS 2020

Automatic perturbation analysis for scalable certified robustness and beyond

Kaidi Xu , Zhouxing Shi , Huan Zhang , Yihan Wang, Kai-Wei Chang , Minlie Huang , Bhavya Kailkhura , Xue Lin , and Cho-Jui Hsieh

Advances in Neural Information Processing Systems, 2020

URL Code