Hi 👋,I’m Lang Gao(/læŋ ɡaʊ/), an undergraduate student of Computer Science and Technology at Huazhong University of Science and Technology(HUST),expected to graduate in July 2025.
I am currently a research assistant at MBZUAI.It is a nice place for research.
I am currently actively seeking for PhD opportunities.If you have any relevant opportunities or suggestions, please feel free to contact me. I am very excited to discuss potential collaborations.
💡 Research Interest
- Large Language Models(LLMs)
- Application on various domains:LLM for healthcare,security,and scientific research.
- Data-Centric Solutions:Constructing high-quality benchmarks and datasets to evaluate and improve LLMs on various tasks.
- Multimodal Large Language Models(MLLMs).
- Trustworthy and Explanable AI(XAI)
- Building self-explaining deep-learning models and workflows to provide faithful and trustworthy predictions.
📖 Educations
2021.09 - now : B.E.(expected), Huazhong University of Science and Technology(HUST)
Proficiencies
GPA:4.28/5.00 (or 3.70/4.00 according to WES)
Course | Result |
---|---|
Calculus | 97 |
Software Engineering | 97 |
Algorithmic Design & Analysis | 97 |
Advanced Programming Language | 94 |
Computer Vision | 94 |
Principles of Imperative Computation | 94 |
Operating System | 91 |
Machine Learning | 91 |
… | … |
Skills
- Deep Learning Framework: Proficient in Pytorch, Tensorflow
- Large Language Models: Proficient in Prompt Engineering(Chain-of-Thought,In-Context-Learning and few-shot learning) Fine-tune techniques(PEFT,full-parameter training;large-scale distributed training on server cluster),Deepspeed, Transformers
- Strong Data Management and Processing Skills: deduplication, cleaning, formatting, and statical analysis.
- Programming Languages: Proficient in Python, Linux,C and C++.
📝 Publications
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine
Yunfei Xie*, Ce Zhou*, Lang Gao*, Juncheng Wu*, Xianhang Li, Hong-Yu Zhou, Sheng Liu, Lei Xing, James Zou, Cihang Xie, and Yuyin Zhou (*: first co-authors) Toolkit & Code
“A comprehensive, large-scale multimodal dataset for medical vision-language models.”
VulDetectBench:Evaluating the Deep Capability of Vulnerability Detection with Large Language Models
Yu Liu*,Lang Gao*,Mingxin Yang*,Yu Xie,Ping Chen,Xiaojin Zhang, and Wei Chen (*: first co-authors)
“A novel, comprehensive benchmark, specifically designed to assess the code vulnerability detection capabilities of LLMs.”
Attacking for Inspection and Instruction: The Risk of Spurious Correlations in Even Clean Datasets Wei Liu, Zhiying Deng, Zhongyu Niu, Lang Gao, Jun Wang, Haozhao Wang, and Ruixuan Li
“An improved interpretable causal model architecture that can simultaneously avoid spurious correlations in data and those caused by insufficient training in traditional self-interpretable models.”
💼 Expeiences
- [2024.10 - now] MBZUAI (Supervisor:Prof.Xiuying Chen,topic:Interpretability in LLMs)
- [2024.07 - 2024.09] Univerisy of Notre Dame,Research Intern (Supervisor:Prof.Xiangliang Zhang,topic:LLMs for Bayesian Optimization)
- [2024.01 - 2024.06] UC Santa Cruz,Research Intern (Supervisor:Prof.Yuyin Zhou,topic:Visual-Language models for healthcare)
- [2023.10 - 2023.12] HUST (Supervisor:Prof.Ruixuan Li,topic:Interpretable deep learning frameworks)
🏆 Honors and Awards
- National First Price,RAICOM Robotics Developer Contest - CAIR Engineering Competition National Finals,2024
- National Second Price,15th China College Students’ Service Outsourcing Innovation and Entrepreneurship Competition,2024
- National Second Prize, The 5th Integrated Circuit EDA Design Elite Challenge (Deep Learning Track), 2023
- National Third Prize, The 5th Global Campus Artificial Intelligence Algorithm Elite Competition,2023.
- National Third Prize, iFlytek Developer Competition, NLP Track, 2023
📜 References
You can find my full CV and an English Transcript here (Latest update:Aug 14th).