Zhuohao Li

I was a Ph.D. student at UCLA EECS. I help to build SGLang, a fast serving framework for large language models and vision language models with Ying Sheng and Lianmin Zheng. SGLang has been widely adopted by industry and academy. My research interests lie in efficient large language models (LLMs), post-training, and scaling inference. I obtained my Bachelor's degree from Shanghai Jiao Tong University. I also visited UT Austin computer science and spent an exchange year at HKUST CSE.

During my career, I am fortunate to work and intern at NVIDIA, AWS AI Lab, and Shanghai AI Laboratory. I was fortunately the receipt of UCLA Samueli Fellowship, UT Austin Cockrell School of Engineering Fellowship, SenseTime Scholarship.

I also treat myself as a trader on options and hedging.

Linkedin / Google Scholar / 𝕏 / GitHub

zhuohaol [at] ucla [dot] edu Santa Clara, CA, United States

I am actively interested in potential collaboration in LLMs and VLMs. If you want to discuss ideas or have a chat, please feel free to shoot me an email and book an appointment with me.

News

[7/2025] 🎉 Hawkeye has been accepted by COLM'25.
[3/2025] 🎉 Hawkeye on RL-based fast reasoning framework has been open source. paper: [Link].
[3/2025] 🎉 Topor has been accepted by USENIX ATC'25.
[6/2024] 🎉 I join AWS AI Lab as an Applied Scientist Intern this summer
[3/2024] I gave a talk at UC Irvine.
[5/2023] 🎉 Officially graduated from SJTU and my thesis was elected as the heighest honor at SJTU!
[3/2023] 🎉 I join Shanghai AI Lab as an Research Scientist Intern this summer!
[6/2022] 🎉 I join NVIDIA as a SWE Intern!

Selected Publications (* equal contribution, full publications can be found in my Google Scholar)

Hawkeye: Efficient Reasoning with Model Collaboration
Jianshu She*, Zhuohao Li*, Zhemin Huang, Mu Li, Peiran Xu, Qirong Ho
code / project page / arXiv'25 / COLM 2025
CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing
JChen Yang, Chenyang Zhao, Zhuohao Li, Quanquan Gu, Dongruo Zhou
code / project page / arXiv'25 / Poster
Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement
Yongji Wu, Wenjie Qu, Tianyang Tao, Zhuang Wang, Wei Bai, Zhuohao Li, Yuan Tian, Jiaheng Zhang, Matthew Lentz, Danyang Zhuo
code / project page / arXiv'24 / Poster
Torpor: GPU-Enabled Serverless Computing for Low-Latency, Resource-Efficient Inference
Minchen Yu, Ao Wang, Dong Chen, Haoxuan Yu, Xiaonan Luo, Zhuohao Li, Wei Wang, Ruichuan Chen, Dapeng Nie, Haoran Yang
code / project page / arXiv'23 / USENIX ATC'25

Misc.

I'm the fan of Motor Racing and Formula 1, particularly Max 33.
I'm the fan of ATP Tennis, particulary Carlos Alcaraz.
I love snowboarding, You can find me on 小红书 and instagram

Services

Reviewers: COLM'25, ACL'25, EMNLP'25, USENIX Security'24, CCS'24