Hi 👋, I’m Lang Gao(/læŋ ɡaʊ/).

I am currently a first-year PhD student at MBZUAI, a great place for research. I’m fortunate to be supervised by Dr. Xiuying Chen, an outstanding rising star and a truly supportive mentor.

💡 Interests

Mechanistic Interpretability (MI): Empirically or theoretically interpret behaviors of LLMs / provide empirically or theoretically interpretable approaches to enhance LLMs.
- Recently I am trying to use MI to address various trustworthiness issues (jailbreaking, bias, etc.) in LLMs. This often yields highly efficient and effective solutions!
Reliable Application of AI (secondary): Explore the reliable application of machine learning models, particularly in the Biomedical domains.
- I see these as potential platforms to extend and prove the usefulness of MI.

I’m always happy to connect with anyone interested in interpretability. It’s a field full of different sparks, and I’m eager to learn from new perspectives. Feel free to reach out!

⚙️ Skills

Deep learning frameworks like Transformers, PyTorch, etc.
Mechanistic Interpretability toolkits: NNsight, TransformerLens, SAELens.

📝 Publications

🧑‍🔬 Mechanistic Interpretability

Evaluate Bias without Manual Test Sets: A Concept Representation Perspective for LLMs

Lang Gao, Kaiyang Wan, Wei Liu, Chenxi Wang, Zirui Song, Zixiang Xu, Yanbo Wang, Veselin Stoyanov, and Xiuying Chen

“BiasLens is a new interpretable method that directly examines concept representations inside LLMs to detect hidden biases, without relying on any human-labeled data.”

Code

Shaping the Safety Boundaries: Understanding and Defending Against Jailbreaks in Large Language Models

Lang Gao, Jiahui Geng, Xiangliang Zhang, Preslav Nakov, and Xiuying Chen

“Try to interpret common mechanisms of diverse LLM jailbreak attacks in the activation space and propose an efficient defense method.”

Word Form Matters: LLMs’ Semantic Reconstruction under Typoglycemia

Chenxi Wang, Tianle Gu, Zhongyu Wei, Lang Gao, Zirui Song, and Xiuying Chen

“How do LLMs make sense of scrambled input words—and why do they trust word form more than context?”

Code

Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets

Wei Liu, Zhongyu Niu, Lang Gao, Zhiying Deng, Jun Wang, Haozhao Wang, and Ruixuan Li

“An interpretable, causal learning paradigm that simultaneously avoids spurious correlations in data and traditional self-interpretable models.”

Code

👨‍🔧 Applications

MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine

Yunfei Xie*, Ce Zhou*, Lang Gao*, Juncheng Wu*, Xianhang Li, Hong-Yu Zhou, Sheng Liu, Lei Xing, James Zou, Cihang Xie, and Yuyin Zhou (*: first co-authors)

“A comprehensive, large-scale multimodal dataset for medical vision-language models.”

Toolkit & Code

VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

Yu Liu*, Lang Gao*, Mingxin Yang*, Yu Xie, Ping Chen, Xiaojin Zhang, and Wei Chen (*: first co-authors)

“A novel, comprehensive benchmark, specifically designed to assess the code vulnerability detection capabilities of LLMs.”

Toolkit & Code

🧐 Service

2025, Reviewer: ACL, EMNLP, NLPCC.

💼 Experiences

[10 / 2024 - 07 / 2025 ] MBZUAI, Research Intern (Supervisor: Dr. Xiuying Chen, topic: Mechanistic Interpretability of LLMs)
[07 / 2024 - 10 / 2024] University of Notre Dame, Research Intern (Supervisor: Prof. Xiangliang Zhang, topic: LLMs for Bayesian Optimization)
[01 / 2024 - 06 / 2024] UC Santa Cruz, Research Intern (Supervisor: Dr. Yuyin Zhou, topic: Visual-Language models for healthcare)
[10 / 2023 - 12 / 2023] HUST (Supervisor: Prof. Ruixuan Li, topic: Interpretable deep learning frameworks)

💬 I am deeply grateful to all the mentors and collaborators who have guided and supported me along the way. Your encouragement, trust, and inspiration have made all the difference in my journey.

📖 Educations

08 / 2025 - Now : Ph.D. student, Mohamed bin Zayed University of Artificial Intelligence

09 / 2021 - 07 / 2025 : B.E., Huazhong University of Science and Technology

🧩Miscellaneous

📚 Resources

Insights

Book: Interpretability in Deep Learning [Link]
Book: Interpretable Machine Learning [Link]
Book: Trustworthy Machine Learning [Link]
Book: 大语言模型 (The Chinese Book for Large Language Models) [Link]
Article: The Bitter Lesson [Link]
Article: The Urgency of Interpretability [Link]

Blogs

[05/24] [Chinese] National Undergraduate Innovation Project Documentation. [Link]
[03/24] [Chinese] Negative Transfer. [Link]
[03/24] [Chinese] Mixture of Experts Explained. [Link]
[01/24] [Chinese] EMNLP2020 Tutorial Notes (Topic: Explainable AI). [Link]

Other Stuff

I also like photography. Sometimes I take good photos by accident. So I might upload a few here someday, along with some unnecessary commentary, but feel free to pretend you’re looking forward to it.🙃