Keynote
Invited Talk

Automated Generation of Training Data for LAMMPS Potentials using MaxEntropy


Machine-learning-based potentials (MLIPs) have taken the world of molecular dynamics by storm due to their dramatically improved accuracy compared to conventional empirical potentials. While near-quantum accuracy is (locally) in reach, MLIPs often show poor transferability to configurations that significantly diJer from their training data. This makes the curation of the training set critical but also challenging, often requiring multiple iterations, rigorous validation, and often human input. In this talk, I will discuss an automated approach to dataset generation based on an information-theoretic approach where the information entropy of the whole dataset, as measured in an ML feature space, is systematically maximized. I will show that this approach generates compact datasets that lead to ultra-robust MLIAPs for a range of materials with minimal human intervention. I will also show how the flexible MLIAPPY interface in LAMMPS made our implementation very simple and intuitive by representing the dataset entropy as an eJective potential on which atoms can evolve.

Danny Perez Danny Perez
Theoretical Division T-1, Los Alamos National Laboratory
  • TBA
  • TBA