Modern biomedical studies often collect multi-view data, that is, multiple types of data measured on the same set of objects. A popular model in high-dimensional multi-view data analysis is to decompose each view's data matrix into a low-rank common-source matrix generated by latent factors common across all data views, a low-rank distinctive-source matrix corresponding to each view, and an additive noise matrix. We propose a novel decomposition method for this model, called decomposition-based generalized canonical correlation analysis (D-GCCA). The D-GCCA rigorously defines the decomposition on the L2 space of random variables in contrast to the Euclidean dot product space used by most existing methods, thereby being able to provide the estimation consistency for the low-rank matrix recovery. Moreover, to well calibrate common latent factors, we impose a desirable orthogonality constraint on distinctive latent factors. Existing methods, however, inadequately consider such orthogonality and may thus suffer from substantial loss of undetected common-source variation. Our D-GCCA takes one step further than generalized canonical correlation analysis by separating common and distinctive components among canonical variables, while enjoying an appealing interpretation from the perspective of principal component analysis. Furthermore, we propose to use the variable-level proportion of signal variance explained by common or distinctive latent factors for selecting the variables most influenced. Consistent estimators of our D-GCCA method are established with good finite-sample numerical performance, and have closed-form expressions leading to efficient computation especially for large-scale data. The superiority of D-GCCA over state-of-the-art methods is also corroborated in simulations and real-world data examples.
翻译:现代生物医学研究往往收集多视角数据,即在同一组物体上测量的多种数据类型。在高维多视图数据分析中,一个流行模型是将每个视图的数据矩阵分解成一个低层次的共源矩阵,该模型是由所有数据视图中常见的潜在因素产生的低层次共同源矩阵,一个与每种观点相对应的低层次特殊源矩阵,以及一个添加噪音矩阵。我们对这一模型提出了一个新的分解方法,称为基于分解的通用相干关系分析(D-GCCA)。D-GCCA严格定义了随机变量在L2空间的分解,与Eucliidean多视角数据分析中大多数现有方法使用的Eucliidean dot 产品空间相对,从而能够为低层次矩阵矩阵恢复工作提供估计的一致性。此外,为了对共同潜在因素进行校正值数据分析,我们现有的方法不适当地考虑到这种或多层次因素,因此可能因未察觉到的普通源值变化而大大丧失。我们的D-G-G-C-G-C-C-C-C-C-deal-deal-deal-deal-deal-deal-deal-deal-deal-de-deal-deal-deal-deal-deal ex-deal-deal ex-deal laisal laisleval laislisl ex ex ex exal ex ex ex exal ex ex ex ex ex ex 一种不同的大分析,通过不同的常规分析,通过分解法分析可以进一步一步地分析,通过一种不同的常规分析,通过不同的普通和不同的常规和不同的常规和不同的常规-colvial-colvicolvical-s-cal-al-al-al-cal-al-al-cal-col-cal-al-ex-ex-cal-cal-cal-deal-deal-deal-deal-deal-deal-deal-deal-deal-al-deal-deal-deal-al-al-deal-deal-deal-al-al-al-al-al-al-deal-ex-ex-deal-deal-