The detection of ligand binding sites for proteins is a fundamental step in Structure-Based Drug Design. Despite notable advances in recent years, existing methods, datasets, and evaluation metrics are confronted with several key challenges: (1) current datasets and methods are centered on individual protein-ligand complexes and neglect that diverse binding sites may exist across multiple complexes of the same protein, introducing significant statistical bias; (2) ligand binding site detection is typically modeled as a discontinuous workflow, employing binary segmentation and subsequent clustering algorithms; (3) traditional evaluation metrics do not adequately reflect the actual performance of different binding site prediction methods. To address these issues, we first introduce UniSite-DS, the first UniProt (Unique Protein)-centric ligand binding site dataset, which contains 4.81 times more multi-site data and 2.08 times more overall data compared to the previously most widely used datasets. We then propose UniSite, the first end-to-end ligand binding site detection framework supervised by set prediction loss with bijective matching. In addition, we introduce Average Precision based on Intersection over Union (IoU) as a more accurate evaluation metric for ligand binding site prediction. Extensive experiments on UniSite-DS and several representative benchmark datasets demonstrate that IoU-based Average Precision provides a more accurate reflection of prediction quality, and that UniSite outperforms current state-of-the-art methods in ligand binding site detection. The dataset and codes will be made publicly available at https://github.com/quanlin-wu/unisite.
翻译:蛋白质配体结合位点的检测是基于结构的药物设计中的基础步骤。尽管近年来取得了显著进展,但现有方法、数据集和评估指标仍面临若干关键挑战:(1)当前数据集和方法主要围绕单个蛋白质-配体复合物展开,忽略了同一蛋白质在不同复合物中可能存在多样化的结合位点,从而引入了显著的统计偏差;(2)配体结合位点检测通常被建模为不连续的工作流程,采用二值分割及后续聚类算法;(3)传统评估指标未能充分反映不同结合位点预测方法的实际性能。为解决这些问题,我们首先提出了UniSite-DS,这是首个以UniProt(唯一蛋白质)为中心的配体结合位点数据集,其多结合位点数据量是先前最广泛使用数据集的4.81倍,总数据量是2.08倍。随后,我们提出了UniSite,这是首个采用基于双射匹配的集合预测损失进行监督的端到端配体结合位点检测框架。此外,我们引入了基于交并比(IoU)的平均精度作为更准确的配体结合位点预测评估指标。在UniSite-DS及多个代表性基准数据集上的大量实验表明,基于IoU的平均精度能更准确地反映预测质量,且UniSite在配体结合位点检测中优于当前最先进的方法。数据集与代码将在https://github.com/quanlin-wu/unisite公开提供。