Yidong Xia
Idaho National Laboratory 
Many-GPU Simulation of Nanopore Flow on the Summit Supercomputer
Fluid flow behaviors in nanoporous materials can be distinct from those following the continuum physics at higher scales. Numerical simulations can be a complement to laboratory experiments. This work presents a dissipative particle dynamics (DPD) package for GPU-accelerated mesoscale flow simulations in nanoporous materials. In an ideal benchmark that minimizes load imbalance, the package delivered nearly perfect strong- and weak-scaling (with up to 4 billion DPD particles) on up to 1,536 V100 GPUs on Oak Ridge National Laboratory's Summit supercomputer. More remarkably, in a benchmark to measure practical usefulness with realistic nanoporous silica geometries, the package exhibited more than 20x speedup over its LAMMPS-based CPU counterpart with the same number nodes (e.g., 384 V100 GPUs vs. 2,688 Power9 cores). Besides, it's worth highlighting that the NVLink2 Host-to-Device interconnects kept CPU-GPU memory copy only 10% of GPU activity time per rank, which is 4 times less than their PCIe counterparts.