通用矩阵因数化 (Generalized Matrix Factorization) - 专知论文

会员服务 ·

0

分解的 · 潜变量/隐变量 · MoDELS · 可约的 · 因子分析 ·

2021 年 8 月 3 日

Generalized Matrix Factorization

翻译：通用矩阵因数化

Łukasz Kidziński,Francis K. C. Hui,David I. Warton,Trevor Hastie

Unmeasured or latent variables are often the cause of correlations between multivariate measurements and are studied in a variety of fields such as psychology, ecology, and medicine. For Gaussian measurements, there are classical tools such as factor analysis or principal component analysis with a well-established theory and fast algorithms. Generalized Linear Latent Variable models (GLLVM) generalize such factor models to non-Gaussian responses. However, current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets with thousands of observational units or responses. In this article, we propose a new approach for fitting GLLVMs to such high-volume, high-dimensional datasets. We approximate the likelihood using penalized quasi-likelihood and use a Newton method and Fisher scoring to learn the model parameters. Our method greatly reduces the computation time and can be easily parallelized, enabling factorization at unprecedented scale using commodity hardware. We illustrate application of our method on a dataset of 48,000 observational units with over 2,000 observed species in each unit, finding that most of the variability can be explained with a handful of factors.

翻译：无法计量或潜在的变量往往是多种变量之间相互关系的原因,这些变量在心理学、生态学和医学等各个领域都得到了研究。关于高斯测量,有一些古典工具,如要素分析或主要组成部分分析,并附有完善的理论和快速算法。通用的线性冷淡变量模型(GLLVM)将此类要素模型概括为非加西人的反应。然而,目前用于估算GLLVM中模型参数的算法需要大量计算,而不是以数千个观测单位或反应对大型数据集进行比例化研究。在本篇文章中,我们提出了将GLLLVMs与这种高容量、高容量数据集相匹配的新方法。我们估计了使用惩罚的准相似性方法、使用Newton方法和Fisher评分来学习模型参数的可能性。我们的方法大大缩短了计算时间,并且可以容易地加以平行,从而能够使用商品硬件在前所未有的规模上实现因子化。我们的方法在48,000个观测单位和每个单位所观测到的2 000多个物种的数据集上的应用情况。我们发现,大多数变异性因素可以用一些因素来解释。

0

相关内容

分解的

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

专知会员服务

13+阅读 · 2020年6月10日

【ACL2020-复旦大学】FLAT：采用扁平化Transformer的中文NER，FLAT: Chinese NER Using Flat-Lattice Transformer

【ACL2020-复旦大学】FLAT：采用扁平化Transformer的中文NER，FLAT: Chinese NER Using Flat-Lattice Transformer

专知会员服务

64+阅读 · 2020年4月28日

【ACL2020】生成事实验证解释，Generating Fact Checking Explanations

【ACL2020】生成事实验证解释，Generating Fact Checking Explanations

专知会员服务

17+阅读 · 2020年4月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

已删除

将门创投

14+阅读 · 2019年5月29日

Bayesian views of generalized additive modelling

Arxiv

0+阅读 · 2021年10月4日

Identifiability in Exact Two-Layer Sparse Matrix Factorization

Arxiv

0+阅读 · 2021年10月4日

Identifiability in Exact Multilayer Sparse Matrix Factorization

Arxiv

0+阅读 · 2021年10月4日

Row-clustering of a Point Process-valued Matrix

Row-clustering of a Point Process-valued Matrix

Arxiv

0+阅读 · 2021年10月4日

One-Bit Matrix Completion with Differential Privacy

Arxiv

0+阅读 · 2021年10月2日

Multiplying Matrices Without Multiplying

Arxiv

9+阅读 · 2021年6月21日

Low-Rank Sinkhorn Factorization

Arxiv

9+阅读 · 2021年3月8日

Generating Fact Checking Explanations

Generating Fact Checking Explanations

Arxiv

9+阅读 · 2020年4月13日

Knowledge Completion for Generics using Guided Tensor Factorization

Arxiv

6+阅读 · 2018年3月28日

ParVecMF: A Paragraph Vector-based Matrix Factorization Recommender System

Arxiv

9+阅读 · 2018年1月10日

VIP会员

文章信息

相关主题

潜变量/隐变量

相关VIP内容

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

专知会员服务

13+阅读 · 2020年6月10日

【ACL2020-复旦大学】FLAT：采用扁平化Transformer的中文NER，FLAT: Chinese NER Using Flat-Lattice Transformer

【ACL2020-复旦大学】FLAT：采用扁平化Transformer的中文NER，FLAT: Chinese NER Using Flat-Lattice Transformer

专知会员服务

64+阅读 · 2020年4月28日

【ACL2020】生成事实验证解释，Generating Fact Checking Explanations

【ACL2020】生成事实验证解释，Generating Fact Checking Explanations

专知会员服务

17+阅读 · 2020年4月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《多域作战兵棋推演：运用形态学分析与人工智能加强国防人员训练》

《采用智能弹药的仿生无人机蜂群实施目标压制》

仿生机器人技术的军事应用

《反集群作战：基于深度学习的分布式决策方法》89页

相关资讯

已删除

将门创投

14+阅读 · 2019年5月29日

相关论文

Bayesian views of generalized additive modelling

Arxiv

0+阅读 · 2021年10月4日

Identifiability in Exact Two-Layer Sparse Matrix Factorization

Arxiv

0+阅读 · 2021年10月4日

Identifiability in Exact Multilayer Sparse Matrix Factorization

Arxiv

0+阅读 · 2021年10月4日

Row-clustering of a Point Process-valued Matrix

Row-clustering of a Point Process-valued Matrix

Arxiv

0+阅读 · 2021年10月4日

One-Bit Matrix Completion with Differential Privacy

Arxiv

0+阅读 · 2021年10月2日

Multiplying Matrices Without Multiplying

Arxiv

9+阅读 · 2021年6月21日

Low-Rank Sinkhorn Factorization

Arxiv

9+阅读 · 2021年3月8日

Generating Fact Checking Explanations

Generating Fact Checking Explanations

Arxiv

9+阅读 · 2020年4月13日

Knowledge Completion for Generics using Guided Tensor Factorization

Arxiv

6+阅读 · 2018年3月28日

ParVecMF: A Paragraph Vector-based Matrix Factorization Recommender System

Arxiv

9+阅读 · 2018年1月10日

微信扫码咨询专知VIP会员