Machine learning workflow for analysis of high-dimensional order parameter space: A case study of polymer crystallization from molecular dynamics simulations
E Tourani and BJ Edwards and B Khomami, JOURNAL OF CHEMICAL PHYSICS, 163, 164911 (2025).
DOI: 10.1063/5.0292454
Currently, identification of crystallization pathways in polymers is being carried out using molecular simulation-based data on a preset cutoff point on a single order parameter (OP) to define nucleated or crystallized regions. Aside from sensitivity to the cutoff, each of these OPs introduces its own systematic biases. In this study, an integrated machine learning workflow is presented to quantify accurately crystallinity in polymeric systems using atomistic molecular dynamics simulation data. Each atom is represented by a high-dimensional feature vector that combines geometric, thermodynamic-like, and symmetry-based descriptors. Low-dimensional embeddings are employed to expose latent structural fingerprints within atomic environments. Subsequently, unsupervised clustering on the embeddings is used to identify crystalline and amorphous atoms with high fidelity. After generating high-quality labels with multidimensional data, we use supervised learning techniques to identify a minimal set of order parameters that can fully capture this label. Various tests were conducted to reduce the feature set, and it is shown that using only three order parameters, namely q6, Si, and p2, is sufficient to recreate the crystallization labels with great accuracy. Based on these observed OPs, the crystallinity index (C-index) is introduced as the logistic regression model's probability of crystallinity. This measure remains bimodal at all stages of the process and achieves >98% classification performance. Notably, a model trained on one or a few snapshots enables efficient on- the-fly computation of crystallinity. Lastly, we demonstrate how the optimal C-index fit evolves during various stages of crystallization, supporting the hypothesis that entropy dominates early nucleation, while q6 gains relevance in the later stages. This workflow yields a data- driven strategy for OP selection and provides a generalizable metric to monitor structural transformations in large-scale polymer simulations.
Return to Publications page