2026

DepBloom: A Dependency-Aware Framework for Multi-GPU Temporal Graph Neural Networks Training
DepBloom: A Dependency-Aware Framework for Multi-GPU Temporal Graph Neural Networks Training

Yang Zhao, Yue Dai, Liang Qiao, et al.

Submitted to ASPLOS'27

We propose DepBloom to accelerate multi-GPU TGNN training with three techniques (1) temporally bounded spatial partitioning to reduce unnecessary cross-GPU synchronization and improve parallelism; (2) intra-batch memory refinement to recover missing short-range temporal dependencies inside enlarged batches; (3) critical-path optimization to prioritize urgent memory updates while deferring non-critical work during multi-GPU pipeline execution. Experimental results across representative models and diverse datasets show that DepBloom reduces multi-GPU training latency by 2.63$\times$ on average and up to 5.16$\times$ compared to state-of-the-art methods, while maintaining comparable or better model accuracy.

DepBloom: A Dependency-Aware Framework for Multi-GPU Temporal Graph Neural Networks Training

Yang Zhao, Yue Dai, Liang Qiao, et al.

Submitted to ASPLOS'27

We propose DepBloom to accelerate multi-GPU TGNN training with three techniques (1) temporally bounded spatial partitioning to reduce unnecessary cross-GPU synchronization and improve parallelism; (2) intra-batch memory refinement to recover missing short-range temporal dependencies inside enlarged batches; (3) critical-path optimization to prioritize urgent memory updates while deferring non-critical work during multi-GPU pipeline execution. Experimental results across representative models and diverse datasets show that DepBloom reduces multi-GPU training latency by 2.63$\times$ on average and up to 5.16$\times$ compared to state-of-the-art methods, while maintaining comparable or better model accuracy.

Exascale SPH for Planetary Defense: Quantifying Cascading Hazards of Chicxulub-Scale Impacts with Two Trillion Particles
Exascale SPH for Planetary Defense: Quantifying Cascading Hazards of Chicxulub-Scale Impacts with Two Trillion Particles

J. Chen, Z. Wang, Yang Zhao, Z. Zhang

Submitted paper

This submission studies exascale smoothed particle hydrodynamics for planetary-defense scenarios and focuses on quantifying cascading hazards from Chicxulub-scale impacts with two-trillion-particle simulations.

Exascale SPH for Planetary Defense: Quantifying Cascading Hazards of Chicxulub-Scale Impacts with Two Trillion Particles

J. Chen, Z. Wang, Yang Zhao, Z. Zhang

Submitted paper

This submission studies exascale smoothed particle hydrodynamics for planetary-defense scenarios and focuses on quantifying cascading hazards from Chicxulub-scale impacts with two-trillion-particle simulations.

GSpTRSV: A Sparse Triangular Solve on GPUs Combining Graph and Sync-Free Method
GSpTRSV: A Sparse Triangular Solve on GPUs Combining Graph and Sync-Free Method

Yang Zhao, J. Chen, L. Song, J. Cheng, H. An

Preprint

We present gSpTRSV, a GPU sparse triangular solve approach that combines graph-based scheduling with sync-free execution to improve parallelism and efficiency on irregular sparse workloads.

GSpTRSV: A Sparse Triangular Solve on GPUs Combining Graph and Sync-Free Method

Yang Zhao, J. Chen, L. Song, J. Cheng, H. An

Preprint

We present gSpTRSV, a GPU sparse triangular solve approach that combines graph-based scheduling with sync-free execution to improve parallelism and efficiency on irregular sparse workloads.

2024

DB-SpGEMM: A Massively Distributed Block-Sparse Matrix-Matrix Multiplication for Linear-Scaling DFT Calculations
DB-SpGEMM: A Massively Distributed Block-Sparse Matrix-Matrix Multiplication for Linear-Scaling DFT Calculations

Zhong Zheng, Junshi Chen, Yang Zhao, Longsheng Song, Xinming Qin, Hong An

In The 53rd International Conference on Parallel Processing (ICPP 2024) Aug 2024

This paper presents a massively distributed block-sparse matrix-matrix multiplication method designed for linear-scaling density-functional-theory calculations.

DB-SpGEMM: A Massively Distributed Block-Sparse Matrix-Matrix Multiplication for Linear-Scaling DFT Calculations

Zhong Zheng, Junshi Chen, Yang Zhao, Longsheng Song, Xinming Qin, Hong An

In The 53rd International Conference on Parallel Processing (ICPP 2024) Aug 2024

This paper presents a massively distributed block-sparse matrix-matrix multiplication method designed for linear-scaling density-functional-theory calculations.

Multi-level Load Balancing Strategies for Massively Parallel Smoothed Particle Hydrodynamics Simulation
Multi-level Load Balancing Strategies for Massively Parallel Smoothed Particle Hydrodynamics Simulation

Yi Zhang, Ziyu Zhang, Yang Zhao, Junshi Chen, Hong An, Zhanming Wang, Longkui Chen

In The 53rd International Conference on Parallel Processing (ICPP 2024) Aug 2024

This paper studies multi-level load-balancing strategies for massively parallel smoothed particle hydrodynamics simulation on large-scale systems.

Multi-level Load Balancing Strategies for Massively Parallel Smoothed Particle Hydrodynamics Simulation

Yi Zhang, Ziyu Zhang, Yang Zhao, Junshi Chen, Hong An, Zhanming Wang, Longkui Chen

In The 53rd International Conference on Parallel Processing (ICPP 2024) Aug 2024

This paper studies multi-level load-balancing strategies for massively parallel smoothed particle hydrodynamics simulation on large-scale systems.

2023

Establishing a Modeling System in 3-km Horizontal Resolution for Global Atmospheric Circulation Triggered by Submarine Volcanic Eruptions with 400 Billion Smoothed Particle Hydrodynamics
Establishing a Modeling System in 3-km Horizontal Resolution for Global Atmospheric Circulation Triggered by Submarine Volcanic Eruptions with 400 Billion Smoothed Particle Hydrodynamics

Shenghong Huang, Junshi Chen, Ziyu Zhang, Xiaoyu Hao, Jun Gu, Hong An, Chun Zhao, Yan Hu, Zhanming Wang, Longkui Chen, Yifan Luo, Jineng Yao, Yi Zhang, Yang Zhao, Zhihao Wang, Dongning Jia, Zhao Jin, Changming Song, Xisheng Luo, Xiaobin He, Dexun Chen

In The International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2023) Nov 2023

This work establishes a high-resolution global atmospheric-circulation modeling system for submarine volcanic eruptions and demonstrates extreme-scale SPH simulation at the 400-billion-particle scale.

Establishing a Modeling System in 3-km Horizontal Resolution for Global Atmospheric Circulation Triggered by Submarine Volcanic Eruptions with 400 Billion Smoothed Particle Hydrodynamics

Shenghong Huang, Junshi Chen, Ziyu Zhang, Xiaoyu Hao, Jun Gu, Hong An, Chun Zhao, Yan Hu, Zhanming Wang, Longkui Chen, Yifan Luo, Jineng Yao, Yi Zhang, Yang Zhao, Zhihao Wang, Dongning Jia, Zhao Jin, Changming Song, Xisheng Luo, Xiaobin He, Dexun Chen

In The International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2023) Nov 2023

This work establishes a high-resolution global atmospheric-circulation modeling system for submarine volcanic eruptions and demonstrates extreme-scale SPH simulation at the 400-billion-particle scale.