Motivation: Combining the results of different experiments to exhibit complex patterns or to improve statistical power is a typical aim of data integration. The starting point of the statistical analysis often comes as sets of $p$-values resulting from previous analyses, that need to be combined in a flexible way to explore complex hypotheses, while guaranteeing a low proportion of false discoveries. Results: We introduce the generic concept of {\sl composed hypothesis}, which corresponds to an arbitrary complex combination of simple hypotheses. We rephrase the problem of testing a composed hypothesis as a classification task, and show that finding items for which the composed null hypothesis is rejected boils down to fitting a mixture model and classify the items according to their posterior probabilities. We show that inference can be efficiently performed and provide a thorough classification rule to control for type I error. The performance and the usefulness of the approach are illustrated on simulations and on two different applications combining data from different types. The method is scalable, does not require any parameter tuning, and provided valuable biological insight on the considered application cases. Availability: We implement the QCH methodology in the \texttt{qch} R package hosted on CRAN.
翻译:动机:将不同实验的结果结合起来,以展示复杂的模式或提高统计能力,这是数据整合的一个典型目的。统计分析的起点往往是以前分析得出的一组美元价值,需要灵活地加以结合,以探讨复杂的假设,同时保证虚假发现的比例较低。结果:我们引入了 ~sl 构成假设的通用概念,这与简单假设的任意复杂组合相对应。我们把测试一个组合假设的问题重新表述为分类任务,并表明查找一个构成无效假设的物品,将归结为一个混合模型,并按其后生概率对物品进行分类。我们表明,推论可以有效进行,并为控制I类错误提供一个彻底的分类规则。该方法的性能和实用性在模拟和将不同类型数据合并的两种不同应用上作了说明。该方法可以缩放,不需要任何参数调整,并且对所考虑的应用案例提供了宝贵的生物洞察力。可用性:我们在 & trext{ch} 主机软件包中实施了QCH方法。