蛋白质功能预测工具质量保证变形测试 (Metamorphic Testing for Quality Assurance of Protein Function Prediction Tools)

Proteins are the workhorses of life and gaining insight on their functions is of paramount importance for applications such as drug design. However, the experimental validation of functions of proteins is highly-resource consuming. Therefore, recently, automated protein function prediction (AFP) using machine learning has gained significant interest. Many of these AFP tools are based on supervised learning models trained using existing gold-standard functional annotations, which are known to be incomplete. The main challenge associated with conducting systematic testing on AFP software is the lack of a test oracle, which determines passing or failing of a test case; unfortunately, due to the incompleteness of gold-standard data, the exact expected outcomes are not well defined for the AFP task. Thus, AFP tools face the \emph{oracle problem}. In this work, we use metamorphic testing (MT) to test nine state-of-the-art AFP tools by defining a set of metamorphic relations (MRs) that apply input transformations to protein sequences. According to our results, we observe that several AFP tools fail all the test cases causing concerns over the quality of their predictions.

翻译：蛋白质是生命的一匹马,对其功能的洞察力对于药物设计等应用至关重要,然而,对蛋白质功能的实验性验证是高度资源消耗,因此,最近,利用机器学习的自动蛋白功能预测(AFP)引起了极大的兴趣。许多AFP工具都是基于使用现有黄金标准功能说明(已知这些说明不完全)所培训的有监督的学习模型。对AFP软件进行系统测试的主要挑战是缺乏测试或触雷,这决定了测试案例的过错;不幸的是,由于黄金标准数据不完整,准确的预期结果没有为AFP任务很好地确定。因此,AFP工具面临了“emph{orcle ” 问题。在这项工作中,我们使用突变测试(MT)来测试九种先进的AFPT工具,确定一套对蛋白序列进行输入转换的元体关系(MRs)。我们发现,一些AFPT工具未能解决其预测质量问题的所有测试案例。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

商业数据分析，39页ppt

专知会员服务

157+阅读 · 2020年6月2日

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

专知会员服务

33+阅读 · 2020年5月9日

因果图，Causal Graphs，52页ppt

专知会员服务

238+阅读 · 2020年4月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日