Homepage - Yang Zhao

Yang Zhao

Ph.D. Student

University of Science and Technology of China

I am currently a Ph.D. student in Computer Science at the University of Science and Technology of China (USTC), advised by Prof. Hong An. My research focuses on efficient graph deep learning systems and heterogeneous parallelism, with an emphasis on designing high performance system support for emerging graph learning workloads.

In particular, I work on scalable training and acceleration techniques for graph based workloads across modern parallel hardware. My recent research includes distributed acceleration for temporal graph neural networks, where I develop dependency aware multi GPU training frameworks to improve parallelism and reduce synchronization overhead while preserving model accuracy. I have also worked on efficient GPU sparse triangular solve, proposing graph based scheduling and sync free execution techniques to improve performance on irregular sparse workloads.

aflyingsheep61(at)gmail.com Google Scholar GitHub LinkedIn ORCID

Education

University of Science and Technology of China

Computer Science and Technology
Ph.D. Student

Sep. 2023 - present
Dalian University of Technology

B.S. in Computer Science and Technology

Sep. 2019 - Jul. 2023

Honors & Awards

First-Class Academic Scholarship

2025, 2024, 2023
Outstanding Graduates

2023
National Scholarship

2022
Taoli Alumni Scholarship

2023

Selected Publications (view all )

DepBloom: A Dependency-Aware Framework for Multi-GPU Temporal Graph Neural Networks Training

Yang Zhao, Yue Dai, Liang Qiao, et al.

Submitted to ASPLOS'27

We propose DepBloom to accelerate multi-GPU TGNN training with three techniques (1) temporally bounded spatial partitioning to reduce unnecessary cross-GPU synchronization and improve parallelism; (2) intra-batch memory refinement to recover missing short-range temporal dependencies inside enlarged batches; (3) critical-path optimization to prioritize urgent memory updates while deferring non-critical work during multi-GPU pipeline execution. Experimental results across representative models and diverse datasets show that DepBloom reduces multi-GPU training latency by 2.63$\times$ on average and up to 5.16$\times$ compared to state-of-the-art methods, while maintaining comparable or better model accuracy.

[Code]

DepBloom: A Dependency-Aware Framework for Multi-GPU Temporal Graph Neural Networks Training

Yang Zhao, Yue Dai, Liang Qiao, et al.

Submitted to ASPLOS'27

[Code]

GSpTRSV: A Sparse Triangular Solve on GPUs Combining Graph and Sync-Free Method

Yang Zhao, J. Chen, L. Song, J. Cheng, H. An

Preprint

We present gSpTRSV, a GPU sparse triangular solve approach that combines graph-based scheduling with sync-free execution to improve parallelism and efficiency on irregular sparse workloads.

GSpTRSV: A Sparse Triangular Solve on GPUs Combining Graph and Sync-Free Method

Yang Zhao, J. Chen, L. Song, J. Cheng, H. An

Preprint

We present gSpTRSV, a GPU sparse triangular solve approach that combines graph-based scheduling with sync-free execution to improve parallelism and efficiency on irregular sparse workloads.

Education

Honors & Awards

Selected Publications (view all )

DepBloom: A Dependency-Aware Framework for Multi-GPU Temporal Graph Neural Networks Training

DepBloom: A Dependency-Aware Framework for Multi-GPU Temporal Graph Neural Networks Training

GSpTRSV: A Sparse Triangular Solve on GPUs Combining Graph and Sync-Free Method

GSpTRSV: A Sparse Triangular Solve on GPUs Combining Graph and Sync-Free Method

All publications