
Yang Zhao, Yue Dai, Liang Qiao, et al.
Submitted to ASPLOS'27
We propose DepBloom to accelerate multi-GPU TGNN training with three techniques (1) temporally bounded spatial partitioning to reduce unnecessary cross-GPU synchronization and improve parallelism; (2) intra-batch memory refinement to recover missing short-range temporal dependencies inside enlarged batches; (3) critical-path optimization to prioritize urgent memory updates while deferring non-critical work during multi-GPU pipeline execution. Experimental results across representative models and diverse datasets show that DepBloom reduces multi-GPU training latency by 2.63$\times$ on average and up to 5.16$\times$ compared to state-of-the-art methods, while maintaining comparable or better model accuracy.
Yang Zhao, Yue Dai, Liang Qiao, et al.
Submitted to ASPLOS'27
We propose DepBloom to accelerate multi-GPU TGNN training with three techniques (1) temporally bounded spatial partitioning to reduce unnecessary cross-GPU synchronization and improve parallelism; (2) intra-batch memory refinement to recover missing short-range temporal dependencies inside enlarged batches; (3) critical-path optimization to prioritize urgent memory updates while deferring non-critical work during multi-GPU pipeline execution. Experimental results across representative models and diverse datasets show that DepBloom reduces multi-GPU training latency by 2.63$\times$ on average and up to 5.16$\times$ compared to state-of-the-art methods, while maintaining comparable or better model accuracy.

J. Chen, Z. Wang, Yang Zhao, Z. Zhang
Submitted paper
This submission studies exascale smoothed particle hydrodynamics for planetary-defense scenarios and focuses on quantifying cascading hazards from Chicxulub-scale impacts with two-trillion-particle simulations.
J. Chen, Z. Wang, Yang Zhao, Z. Zhang
Submitted paper
This submission studies exascale smoothed particle hydrodynamics for planetary-defense scenarios and focuses on quantifying cascading hazards from Chicxulub-scale impacts with two-trillion-particle simulations.

Yang Zhao, J. Chen, L. Song, J. Cheng, H. An
Preprint
We present gSpTRSV, a GPU sparse triangular solve approach that combines graph-based scheduling with sync-free execution to improve parallelism and efficiency on irregular sparse workloads.
Yang Zhao, J. Chen, L. Song, J. Cheng, H. An
Preprint
We present gSpTRSV, a GPU sparse triangular solve approach that combines graph-based scheduling with sync-free execution to improve parallelism and efficiency on irregular sparse workloads.

Zhong Zheng, Junshi Chen, Yang Zhao, Longsheng Song, Xinming Qin, Hong An
In The 53rd International Conference on Parallel Processing (ICPP 2024) Aug 2024
This paper presents a massively distributed block-sparse matrix-matrix multiplication method designed for linear-scaling density-functional-theory calculations.
Zhong Zheng, Junshi Chen, Yang Zhao, Longsheng Song, Xinming Qin, Hong An
In The 53rd International Conference on Parallel Processing (ICPP 2024) Aug 2024
This paper presents a massively distributed block-sparse matrix-matrix multiplication method designed for linear-scaling density-functional-theory calculations.

Yi Zhang, Ziyu Zhang, Yang Zhao, Junshi Chen, Hong An, Zhanming Wang, Longkui Chen
In The 53rd International Conference on Parallel Processing (ICPP 2024) Aug 2024
This paper studies multi-level load-balancing strategies for massively parallel smoothed particle hydrodynamics simulation on large-scale systems.
Yi Zhang, Ziyu Zhang, Yang Zhao, Junshi Chen, Hong An, Zhanming Wang, Longkui Chen
In The 53rd International Conference on Parallel Processing (ICPP 2024) Aug 2024
This paper studies multi-level load-balancing strategies for massively parallel smoothed particle hydrodynamics simulation on large-scale systems.

Shenghong Huang, Junshi Chen, Ziyu Zhang, Xiaoyu Hao, Jun Gu, Hong An, Chun Zhao, Yan Hu, Zhanming Wang, Longkui Chen, Yifan Luo, Jineng Yao, Yi Zhang, Yang Zhao, Zhihao Wang, Dongning Jia, Zhao Jin, Changming Song, Xisheng Luo, Xiaobin He, Dexun Chen
In The International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2023) Nov 2023
This work establishes a high-resolution global atmospheric-circulation modeling system for submarine volcanic eruptions and demonstrates extreme-scale SPH simulation at the 400-billion-particle scale.
Shenghong Huang, Junshi Chen, Ziyu Zhang, Xiaoyu Hao, Jun Gu, Hong An, Chun Zhao, Yan Hu, Zhanming Wang, Longkui Chen, Yifan Luo, Jineng Yao, Yi Zhang, Yang Zhao, Zhihao Wang, Dongning Jia, Zhao Jin, Changming Song, Xisheng Luo, Xiaobin He, Dexun Chen
In The International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2023) Nov 2023
This work establishes a high-resolution global atmospheric-circulation modeling system for submarine volcanic eruptions and demonstrates extreme-scale SPH simulation at the 400-billion-particle scale.