Tianyu Guo 郭天宇

E-mail : guoty9[at]mail2.sysu.edu.cn

About

I’m a fourth-year Ph.D. student of Computer Science and Technology at Sun Yat-Sen University co-advised by Assoc. Prof. Xianwei Zhang and Prof. Nong Xiao . I completed bachelor degree at Xidian University. My reasearch insterest lies in GPU architecture,MLSys and AI Infra. I’m also passionate about the open source community (Check out my projects/PRs). You can also have a look at my RESUME for more details.

Publications

^* Equal contribution

[arXiv]
RServe: Overlapping Encoding and Prefill for Efficient LMM Inference
Tianyu Guo, Tianming Xu, Xianjie Chen, Junru Chen, Nong Xiao, Xianwei Zhang
[NeurIPS’25] [CCF-A] [OpenReview] [Github]
DynaPipe: Dynamic Layer Redistribution for Efficient Serving of LLMs with Pipeline Parallelism
Hongxin Xu^*, Tianyu Guo^* and Xianwei Zhang, The Thirty-Ninth Annual Conference on Neural Information Processing Systems, San Diego, CA, United States, December 2025.
[SC’25] [CCF-A] [DOI] [Slide] [arXiv] [Github]
gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling
Tianyu Guo, Xianwei Zhang, Jiangsu Du, Zhiguang Chen, Nong Xiao, Yutong Lu, The International Conference for High Performance Computing, Networking, Storage, and Analysis, St. louis, MO, United States, November 2025.
[Euro-Par’25] [CCF-B] [DOI] [Slide] [arXiv] [Github]
EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse
Tianyu Guo, Hande Dong, Yichong Leng, Feng Liu, Cheater Lin, Nong Xiao and Xianwei Zhang, The 31st International European Conference on Parallel and Distributed Processing, Dresden, Germany, August 2025.
[ASP-DAC’25] [CCF-C] [DOI]
Mpache: Interaction Aware Multi-level Cache Bypassing on GPUs
Mengyue Xi, Tianyu Guo, Xuanteng Huang, Zejia Lin, Xianwei Zhang, The 30th Asia and South Pacific Design Automation Conference, Tokyo Odaiba Miraikan, Japan, January 2025.
[DAC’24] [CCF-A] [DOI] [Slide]
SMILE: LLC-based Shared Memory Expansion to Improve GPU Thread Level Parallelism
Tianyu Guo, Xuanteng Huang, Kan Wu, Xianwei Zhang and Nong Xiao, The 61st ACM/IEEE Design Automation Conference, San Francisco, CA, United States, June 2024.

Experience

2026.03 - Present

Intern at Tencent Hunyuan
- vllm-router integration for RL
- Scheduling policy optimization for large-scale code agent production
- PD disaggregation integration and support for RL

2025.12 - 2026.03

Intern at Moonshot AI
- Eagle3 adaptation and training
- Chinese corpus training with SpecForge

2025.08 - 2025.12

RedStar intern at RedNote hilab
- [RServe] Large multimodal model inference and EPD disaggregation

2024.01 - 2024.06

Research intern at Tencent Code Buddy team
- [CrossKV][KVsail] [EFIM] KV cache reuse and offloading
- LLMs inference systems with extreme performance

2023.08 - 2023.12

Participate and win the 2nd prize of A3 track in the 1st ACTIC
- [Preliminary]/[Final] Presentations and [Technical Report]
- Operator implementation and performance optimization with vector instruction set

2022.10 - 2023.01

Teaching Assistant of “SYSU-DCS3013 : Computer Architecture”
- Release [SYSU-ARCH LAB] that focuses on the use and extending of simulators

Projects

Presentations & HW & Dissertation

Work report 24 summer
Weekly Paper Sharing SC23 “Frontier: Exploring Exascale”
Weekly Paper Sharing MLSYS23 “AUTOSCRATCH: ML-OPTIMIZED CACHE MANAGEMENT FOR INFERENCE-ORIENTED GPUS”
Work report 22-23
Weekly Paper Sharing HPCA23 “DIMM-Link: Enabling Efficient Inter-DIMM Communication for Near-Memory Processing”
AI final Homework “A Convolutional Neural Network Framework support on CPU and GPU”
Bachelor’s dissertation “General Computing optimization for GPU based on Cache management”

Tianyu Guo 郭天宇

https://gty111.github.io/info/index.html

Author

Tianyu Guo

Posted on

2026-06-13

Updated on

2026-06-13

Tianyu Guo 郭天宇

About

Publications

Experience

Projects

Presentations & HW & Dissertation

Author

Posted on

Updated on

Licensed under

Comments

Links

Recents

Archives

Categories

Tags