Predicting PFAS Diffusion Coefficients with Active Learning and Molecular Dynamics
A Jagadisan and H Boukhalfa and M Mehana, ENVIRONMENTAL SCIENCE & TECHNOLOGY, 59, 24997-25009 (2025).
DOI: 10.1021/acs.est.5c08559
Per- and polyfluoroalkyl substances (PFAS) are over 14 000 synthetic compounds with exceptional environmental persistence. Used extensively in industrial and consumer applications, PFAS resist degradation and accumulate in environmental media and living organisms, causing serious health risks, including cancer, liver damage, and immune dysfunction. This persistence and toxicity create urgent needs for environmental fate assessment. Predicting PFAS environmental transport remains challenging due to the lack of reliable diffusion coefficient data, critical for modeling contaminant mobility and designing remediation strategies. Experimental measurements are time-consuming and expensive, while fully computational approaches are infeasible due to chemical space scale. We developed an integrated machine learning and molecular dynamics framework using active learning to predict diffusion coefficients across the PFAS chemical space. Starting with measured diffusion coefficients, we train models using chemical graph-based representations and physicochemical descriptors. The approach iteratively identifies molecules with highest prediction uncertainty, performs targeted MD simulations, and retrains models to efficiently explore chemical space while minimizing computational cost. The framework achieved significant performance improvements, reducing mean relative error by 88% and increasing R 2 from 0.095 to 0.907. Uncertainty-based sampling consistently outperformed random selection at optimal batch sizes of 50-100 compounds. This data-efficient approach enables transport property prediction across thousands of PFAS molecules, supporting environmental risk assessment and remediation planning.
Return to Publications page