Highly Accurate and Fast Prediction of MOF Free Energy via Machine Learning

AN Rubungo and F Fajardo-Rojas and DA Gomez-Gualdron and AB Dieng, JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 147, 48035-48045 (2025).

DOI: 10.1021/jacs.5c13960

A major bottleneck in translating computational screening of metal- organic frameworks (MOFs) to laboratory synthesis is the uncertain synthetic accessibility of computer-generated MOF designs. While MOF free energy calculations are emerging as a way to anticipate synthetic accessibility, the computational cost of traditional molecular simulation methods remains incompatible with the immense MOF design space, which spans trillions of structures. Here, we propose and demonstrate the efficient prediction of this important quantity using machine learning. Our approach is based on three ingredients: (1) a newly curated data set of nearly 1 million MOFs (MOFMinE) and their properties, including simulated strain energies, as well as simulated free energies for a subset of these structures; (2) a novel sequence representation of MOFs, called MOFSeq, that incorporates both local and global MOF features; and (3) a large language model (LLM-Prop) pretrained on strain energy prediction and then fine-tuned on free energy prediction using MOFSeq. Our trained model achieves a mean absolute error (MAE) of 0.789 kJ/molMOFatom on free energy prediction. Remarkably, this same model, without retraining, demonstrated exceptional versatility by predicting whether MOFs were above or below an empirical free energy-based synthesizability threshold with a 97% F1 score. Furthermore, it correctly selected the lowest free energy MOF polymorph with an average accuracy of 78.1%, with this accuracy increasing to 100% for free energy differences greater than 4.05 kJ/molMOFatom and remaining above 60% even when having to sort out free energy differences as low as 0.16 kJ/molMOFatom.

Return to Publications page