PO2PLS的异种数据统计综合与PO2PLS的统计 (Statistical Integration of Heterogeneous Data with PO2PLS) - 专知论文

会员服务 ·

0

Integration · 统计量 · Better · 估计/估计量 · Extensibility ·

2021 年 3 月 24 日

Statistical Integration of Heterogeneous Data with PO2PLS

翻译：PO2PLS的异种数据统计综合与PO2PLS的统计

Said el Bouhaddani,Hae-Won Uh,Geurt Jongbloed,Jeanine Houwing-Duistermaat

from arxiv, 36 pages, 4 figures, Submitted to Journal of the American Statistical Association

The availability of multi-omics data has revolutionized the life sciences by creating avenues for integrated system-level approaches. Data integration links the information across datasets to better understand the underlying biological processes. However, high-dimensionality, correlations and heterogeneity pose statistical and computational challenges. We propose a general framework, probabilistic two-way partial least squares (PO2PLS), which addresses these challenges. PO2PLS models the relationship between two datasets using joint and data-specific latent variables. For maximum likelihood estimation of the parameters, we implement a fast EM algorithm and show that the estimator is asymptotically normally distributed. A global test for testing the relationship between two datasets is proposed, and its asymptotic distribution is derived. Notably, several existing omics integration methods are special cases of PO2PLS. Via extensive simulations, we show that PO2PLS performs better than alternatives in feature selection and prediction performance. In addition, the asymptotic distribution appears to hold when the sample size is sufficiently large. We illustrate PO2PLS with two examples from commonly used study designs: a large population cohort and a small case-control study. Besides recovering known relationships, PO2PLS also identified novel findings. The methods are implemented in our R-package PO2PLS. Supplementary materials for this article are available online.

翻译：多组数据的可用性通过创造综合系统层面方法的渠道,使生命科学发生革命性的变化。数据整合将跨数据集的信息连接起来,以更好地了解基本生物过程。然而,高维度、相关性和异质性提出了统计和计算方面的挑战。我们提出了一个总体框架,即双向双向半方方方形(PO2PLS),用以应对这些挑战。PO2PLS模型使用联合和特定数据的潜在变量来模拟两个数据集之间的关系。为了对参数进行最大的可能性估计,我们采用了快速EM算法,并表明估计数据是正常分布的。提出了用于测试两个数据集之间关系的全球测试,并提出了统计和计算方面的分布。值得注意的是,现有的几种双向双向半方半方方方方方形(PO2PLS)的整合方法是PO2PLS的特殊案例。我们通过广泛的模拟,显示PO2PLS在特性选择和预测性能方面的表现比替代品要好得多。此外,当样本大小时,我们似乎会保持着一种快速的分布。我们用PO2PLS演示了两个模型来说明我们所了解的常规研究的样本。我们所研究中的大规模的POPLS 。我们所要用的新研究中发现的样本。我们所知道的两组的样本分析方法。

0

相关内容

Integration

Integration：Integration, the VLSI Journal。 Explanation：集成，VLSI杂志。 Publisher：Elsevier。 SIT：http://dblp.uni-trier.de/db/journals/integration/

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

专知会员服务

134+阅读 · 2020年4月14日

【WWW2020-MAGNN】异质图嵌入的集合图神经网络 MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding

【WWW2020-MAGNN】异质图嵌入的集合图神经网络 MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding

专知会员服务

116+阅读 · 2020年2月10日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

Estimating heterogeneous survival treatment effect in observational data using machine learning

Arxiv

0+阅读 · 2021年5月19日

Learning with tree tensor networks: complexity estimates and model selection

Arxiv

0+阅读 · 2021年5月19日

The convergence speed of MLE to the information projection of an exponential family -- a criteria for the model dimension and the sample size -- with complete proof

Arxiv

0+阅读 · 2021年5月19日

rx-anon -- A Novel Approach on the De-Identification of Heterogeneous Data based on a Modified Mondrian Algorithm

Arxiv

0+阅读 · 2021年5月18日

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Arxiv

9+阅读 · 2021年2月8日

SEEK: Segmented Embedding of Knowledge Graphs

Arxiv

8+阅读 · 2020年5月2日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures

Arxiv

6+阅读 · 2018年5月17日

Stable Distribution Alignment Using the Dual of the Adversarial Distance

Arxiv

3+阅读 · 2018年1月30日

Knowledge Graph Embedding with Multiple Relation Projections

Arxiv

4+阅读 · 2018年1月26日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

专知会员服务

134+阅读 · 2020年4月14日

【WWW2020-MAGNN】异质图嵌入的集合图神经网络 MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding

【WWW2020-MAGNN】异质图嵌入的集合图神经网络 MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding

专知会员服务

116+阅读 · 2020年2月10日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《以任务为中心的建模未来：将集成数字成熟度路径与用户故事框架融入任务工程》最新文献

《人机协作集成模型中的不确定性捕获》博士论文

运用不可解释人工智能进行军事决策

《以军铁剑战争中的战场决策》最新报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

相关论文

Estimating heterogeneous survival treatment effect in observational data using machine learning

Arxiv

0+阅读 · 2021年5月19日

Learning with tree tensor networks: complexity estimates and model selection

Arxiv

0+阅读 · 2021年5月19日

The convergence speed of MLE to the information projection of an exponential family -- a criteria for the model dimension and the sample size -- with complete proof

Arxiv

0+阅读 · 2021年5月19日

rx-anon -- A Novel Approach on the De-Identification of Heterogeneous Data based on a Modified Mondrian Algorithm

Arxiv

0+阅读 · 2021年5月18日

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Arxiv

9+阅读 · 2021年2月8日

SEEK: Segmented Embedding of Knowledge Graphs

Arxiv

8+阅读 · 2020年5月2日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures

Arxiv

6+阅读 · 2018年5月17日

Stable Distribution Alignment Using the Dual of the Adversarial Distance

Arxiv

3+阅读 · 2018年1月30日

Knowledge Graph Embedding with Multiple Relation Projections

Arxiv

4+阅读 · 2018年1月26日

微信扫码咨询专知VIP会员