Similarity Metric for Data Optimization and Efficient Training of Reactive Machine Learning Force Fields for Hydrocarbon Radiolysis

K Kim and MP Kroonblawd and N Goldman, JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 21, 11079-11092 (2025).

DOI: 10.1021/acs.jctc.5c01301

Radiolysis is a common approach to sterilize polymers, chemically modify them for upcycling, and accelerate their decomposition for recycling purposes. Reactive molecular dynamics (MD) simulations provide a powerful tool to generate atomic-level trajectories of the reactive processes and quantify radiolytic chemical degradation pathways. For this, machine learning (ML) surrogate models for reactive force fields with quantum mechanical accuracy are now widely used, which require ML training data sets that can provide information on atomic environments for target chemical systems. However, radiolysis chemistry can be highly complex and diverse, which poses significant challenges for generating training data to parametrize ML models. In this regard, we developed a method for optimizing the training data set using a cosine similarity metric to help guide training set selection for radiolysis of polyethylene, a model hydrocarbon polymer, as well as to enhance the transferability of our reactive ML force field (MLFF) to a variety of molecular and polymeric systems. Our approach performs atom-by-atom comparisons between local atomic environments to pinpoint important data points associated with rare and localized events, such as radiolysis damage within structures. We apply this approach to train the Chebyshev Interaction Model for Efficient Simulation (ChIMES) MLFF model, which expresses the atomic interaction potentials in terms of linear combinations of many-body Chebyshev polynomials. We first show that our method can reduce our training set size by similar to 70% while improving overall accuracy compared to more standard MD model fitting approaches. We then validate our optimum model against diverse hydrocarbon simulation data, including simple alkanes and systems with unsaturated carbon bonds, over a wide range of thermodynamic conditions. Finally, we use our ChIMES model to perform MD simulations of radiolytic damage with large-scale systems that help avoid system size effects. Overall, our approach yields an MD force field that retains most of the accuracy of the underlying quantum method while yielding many orders of improvement in computational efficiency. Our efforts will have impact on future hydrocarbon polymer radiolysis studies, where the chemical details of the polymer-radiation interactions can have a strong effect on the resulting products observed in experiments.

Return to Publications page