The features in high dimensional biomedical prediction problems are often well described with lower dimensional manifolds. An example is genes that are organised in smaller functional networks. The outcome can then be described with the factor regression model. A benefit of the factor model is that is allows for straightforward inclusion of unlabeled observations in the estimation of the model, i.e., semi-supervised learning. In addition, the high dimensional features in biomedical prediction problems are often well characterised. Examples are genes, for which annotation is available, and metabolites with $p$-values from a previous study available. In this paper, the extra information on the features is included in the prior model for the features. The extra information is weighted and included in the estimation through empirical Bayes, with Variational approximations to speed up the computation. The method is demonstrated in simulations and two applications. One application considers influenza vaccine efficacy prediction based on microarray data. The second application predictions oral cancer metastatsis from RNAseq data.
翻译:高维生物医学预测问题的特征往往用较低维度的方位来详细描述。例如,在较小的功能网络中组织的基因,然后可以用系数回归模型来描述结果。要素模型的一个好处是,可以在模型的估计中直接纳入未贴标签的观察,即半监督的学习。此外,生物医学预测问题中的高维特征往往具有很好的特征。例如,基因,可以提供注释,以及具有以前研究的美元值的代谢物。本文中,关于这些特征的额外信息包含在先前的特征模型中。额外信息通过经验性贝亚值加权并纳入估算,通过变动近似值加快计算速度。这种方法在模拟和两个应用中得到证明。一种应用根据微粒数据来考虑流感疫苗功效预测。第二种应用预测来自RNAseq数据的口服癌症元。