Parallelization Strategies for DeepMD-Kit Using OpenMP: Enhancing Efficiency in Machine Learning-Based Molecular Simulations

Q Du and F Wang and CK Wu, IEEE TRANSACTIONS ON COMPUTERS, 74, 3534-3545 (2025).

DOI: 10.1109/TC.2025.3595078

DeepMD-kit enables deep learning-based molecular dynamics (MD) simulations that require efficient parallelization to leverage modern HPC architectures. In this work, we optimize DeepMD-kit using advanced OpenMP strategies to improve scalability and computational efficiency on an ARMv8 processor-based server. Our optimizations include data parallelism for neural network inference, force calculation acceleration, NUMA-aware memory management, and synchronization reductions, leading to up to 4.1 x speedup and 82% higher memory bandwidth efficiency compared to the baseline implementation. Strong scaling analysis demonstrates superlinear speedup at mid-range core counts, with improved workload balancing and vectorized computations. However, challenges remain at ultra-large scales due to increasing synchronization overhead.

Return to Publications page