Machine learning interatomic potentials (MLIPs) have become powerful tools to extend molecular simulations beyond the limits of quantum methods, offering near-quantum accuracy at much lower computational cost. Yet, developing reliable MLIPs remains difficult because it requires generating high-quality datasets, preprocessing atomic structures, and carefully training and validating models. In this work, we introduce an Automated Machine Learning Pipeline (AMLP) that unifies the entire workflow from dataset creation to model validation. AMLP employs large-language-model agents to assist with electronic-structure code selection, input preparation, and output conversion, while its analysis suite (AMLP-Analysis), based on ASE supports a range of molecular simulations. The pipeline is built on the MACE architecture and validated on acridine polymorphs, where, with a straightforward fine-tuning of a foundation model, mean absolute errors of ~1.7 meV/atom in energies and ~7.0 meV/Å in forces are achieved. The fitted MLIP reproduces DFT geometries with sub-Å accuracy and demonstrates stability during molecular dynamics simulations in the microcanonical and canonical ensembles.
翻译:机器学习原子间势能(MLIPs)已成为扩展分子模拟能力的重要工具,其能以远低于量子方法的计算成本实现接近量子精度的计算。然而,开发可靠的MLIPs仍然面临挑战,因为这需要生成高质量数据集、预处理原子结构,并进行细致的模型训练与验证。本研究提出一种自动化机器学习流程(AMLP),该流程将数据集构建至模型验证的完整工作流进行整合。AMLP采用大语言模型智能体辅助完成电子结构代码选择、输入文件准备及输出结果转换,同时其基于ASE开发的分析套件(AMLP-Analysis)支持多种分子模拟任务。该流程建立在MACE架构之上,并在吖啶多晶型体系上完成验证。通过对基础模型进行简单微调,实现了约1.7 meV/原子的能量平均绝对误差与约7.0 meV/Å的力场平均绝对误差。拟合所得的MLIP能够以亚埃级精度复现DFT几何结构,并在微正则系综与正则系综的分子动力学模拟中表现出良好的稳定性。