Coarse-grained (CG) molecular dynamics simulations enable efficient exploration of protein conformational ensembles. However, reconstructing atomic details from CG structures (backmapping) remains a challenging problem. Current approaches face an inherent trade-off between maintaining atomistic accuracy and exploring diverse conformations, often necessitating complex constraint handling or extensive refinement steps. To address these challenges, we introduce a novel two-stage framework, named CODLAD (COnstraint Decoupled LAtent Diffusion). This framework first compresses atomic structures into discrete latent representations, explicitly embedding structural constraints, thereby decoupling constraint handling from generation. Subsequently, it performs efficient denoising diffusion in this latent space to produce structurally valid and diverse all-atom conformations. Comprehensive evaluations on diverse protein datasets demonstrate that CODLAD achieves state-of-the-art performance in atomistic accuracy, conformational diversity, and computational efficiency while exhibiting strong generalization across different protein systems. Code is available at https://github.com/xiaoxiaokuye/CODLAD.
翻译:粗粒化分子动力学模拟能够高效探索蛋白质构象集合。然而,从粗粒化结构重建原子细节(反向映射)仍是一个具有挑战性的问题。现有方法在保持原子精度与探索多样构象之间存在固有权衡,通常需要复杂的约束处理或大量精修步骤。为解决这些挑战,我们提出了一种新颖的两阶段框架,命名为CODLAD(约束解耦隐空间扩散)。该框架首先将原子结构压缩为离散的隐空间表示,并显式嵌入结构约束,从而将约束处理与生成过程解耦。随后,在此隐空间中进行高效的去噪扩散,以生成结构有效且多样的全原子构象。在不同蛋白质数据集上的综合评估表明,CODLAD在原子精度、构象多样性和计算效率方面均达到了最先进的性能,同时在不同蛋白质系统中展现出强大的泛化能力。代码发布于 https://github.com/xiaoxiaokuye/CODLAD。