SimTrace: Exploiting Spatial and Temporal Sampling for Large-Scale Performance Analysis

ZB Xuan and X You and TY Feng and HL Yang and ZZ Luan and Y Liu and DP Qian, ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 22, 55 (2025).

DOI: 10.1145/3720544

MPI tracing tools is essential to collect the communication events and performance metrics of large-scale programs for further performance analysis and optimization. However, toward the exascale era, the performance and storage overhead for tracing becomes extremely prohibitive that significantly disturbs the original execution of MPI programs, leading to distorted tracing data and thus mislead analysis results. Although process sampling can effectively reduce the tracing overhead, it can easily miss important execution information that is necessary for subsequent performance analysis. In this article, we propose SimTrace, a scalable MPI tracing tool with novel spatial and temporal sampling strategies that exploits the similarity among MPI processes to achieve both low tracing overhead as well as obtain sufficient tracing information. The experimental results demonstrate that SimTrace can significantly reduce the MPI tracing overhead compared to the state-of-the-art tracing tools, meanwhile enabling effective analysis to guide performance optimization of large-scale programs.

Return to Publications page