Learning by integrating multiple heterogeneous data sources is a common requirement in many tasks. Collective Matrix Factorization (CMF) is a technique to learn shared latent representations from arbitrary collections of matrices. It can be used to simultaneously complete one or more matrices, for predicting the unknown entries. Classical CMF methods assume linearity in the interaction of latent factors which can be restrictive and fails to capture complex non-linear interactions. In this paper, we develop the first deep-learning based method, called dCMF, for unsupervised learning of multiple shared representations, that can model such non-linear interactions, from an arbitrary collection of matrices. We address optimization challenges that arise due to dependencies between shared representations through Multi-Task Bayesian Optimization and design an acquisition function adapted for collective learning of hyperparameters. Our experiments show that dCMF significantly outperforms previous CMF algorithms in integrating heterogeneous data for predictive modeling. Further, on two tasks - recommendation and prediction of gene-disease association - dCMF outperforms state-of-the-art matrix completion algorithms that can utilize auxiliary sources of information.
翻译:通过整合多种不同数据源进行学习是许多任务的共同要求。集体矩阵系数(CMF)是一种从任意收集的矩阵中学习共享潜在代表的技术,可用于同时完成一个或多个矩阵,以预测未知条目。典型的CMF方法在潜在因素相互作用中具有线性,这些潜在因素可能具有限制性,无法捕捉复杂的非线性互动。在本文中,我们开发了第一个深层次的基于学习的方法,称为DCMF,用于在不受监督的情况下学习多个共享代表,该方法可以从任意收集的矩阵中模拟这种非线性互动。我们处理由于多语波优化而共享代表之间的依赖性所产生的优化挑战,并设计了为集体学习超参数而调整的获取功能。我们的实验表明,DCMF在将各种数据整合成预测模型方面大大超越了先前的CMF算法。此外,还有两项任务――建议和预测基因- dCMF优于能够利用辅助信息来源的状态矩阵完成算法。