通过候选人评分对假设生成系统进行大范围核证 (Large-Scale Validation of Hypothesis Generation Systems via Candidate Ranking)

The first step of many research projects is to define and rank a short list of candidates for study. In the modern rapidity of scientific progress, some turn to automated hypothesis generation (HG) systems to aid this process. These systems can identify implicit or overlooked connections within a large scientific corpus, and while their importance grows alongside the pace of science, they lack thorough validation. Without any standard numerical evaluation method, many validate general-purpose HG systems by rediscovering a handful of historical findings, and some wishing to be more thorough may run laboratory experiments based on automatic suggestions. These methods are expensive, time consuming, and cannot scale. Thus, we present a numerical evaluation framework for the purpose of validating HG systems that leverages thousands of validation hypotheses. This method evaluates a HG system by its ability to rank hypotheses by plausibility; a process reminiscent of human candidate selection. Because HG systems do not produce a ranking criteria, specifically those that produce topic models, we additionally present novel metrics to quantify the plausibility of hypotheses given topic model system output. Finally, we demonstrate that our proposed validation method aligns with real-world research goals by deploying our method within Moliere, our recent topic-driven HG system, in order to automatically generate a set of candidate genes related to HIV-associated neurodegenerative disease (HAND). By performing laboratory experiments based on this candidate set, we discover a new connection between HAND and Dead Box RNA Helicase 3 (DDX3). Reproducibility: code, validation data, and results can be found at sybrandt.com/2018/validation.

翻译：许多研究项目的第一步是界定和排列一个供研究的候选者的短名单。在科学进步的现代速度中,有些可以转向自动假设生成(HG)系统来帮助这一过程。这些系统可以在大型科学体中发现隐含或被忽视的联系,尽管其重要性随着科学的步伐而增加,但它们缺乏彻底的验证。如果没有任何标准的数字评价方法,许多通过重新发现少数历史发现来验证普通用途的HG系统,而有些希望更彻底的人可能会在自动建议的基础上进行实验室实验。这些方法费用昂贵、耗时且无法规模化。因此,我们提出了一个数字评价框架,用于验证利用数千个验证假设的HG系统。这些系统可以在大型科学体中查明隐含或被忽视的联系,而这种系统通过光度来评价HG系统对假冒进行排序的能力,这种过程使人想起人类候选人的选择。由于HG系统并不产生排名标准,特别是产生主题模型,因此我们提出了一些新的衡量标准,用以量化给主题系统输出的假设值的说服力。因此,我们提出了一个数字评价方法框架框架,即根据我们最新的HG-RA的系统在预估测结果中,在预选的HO-HCRationoral 的实验室里,在我们的实验室里,在我们的基因实验中,在确定一个新的H-H-CReval-ral-ral-h-ral的实验中,在我们的基因-h-h-h-h-h-h-h-h-h-h-h-h-ar-rod-rald-rod-h-rod-rod-rod-raldald-ld-ld-ld-h-ld-ld-h-h-h-ld-h-h-h-h-s-s-s-s-h-s-s-s-s-s-h-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-h-h-ld-ld-h-ld-ld-ld-h-h-h-ld-