Across fields such as machine learning, social science, geography, considerable attention has been given to models that factorize a nonnegative matrix into the product of two or three matrices, subject to nonnegative or row-sum-to-1 constraints. Although these models are to a large extend similar or even equivalent, they are presented under different names, and their similarity is not well known. This paper highlights similarities among five popular models, latent budget analysis (LBA), latent class analysis (LCA), end-member analysis (EMA), probabilistic latent semantic analysis (PLSA), and nonnegative matrix factorization (NMF). We focus on an essential issue-identifiability-of these models and prove that the solution of LBA, EMA, LCA, PLSA is unique if and only if the solution of NMF is unique. We also provide a brief review for algorithms of these models. We illustrate the models with a time budget dataset from social science, and end the paper with a discussion of closely related models such as archetypal analysis.
翻译:在机器学习、社会科学、地理学等诸多领域中,将非负矩阵分解为两个或三个矩阵的乘积(通常施加非负性或行和为1的约束)的模型受到了广泛关注。尽管这些模型在很大程度上相似甚至等价,但它们以不同的名称被提出,其相似性并未广为人知。本文重点阐述了五种流行模型——潜在预算分析(LBA)、潜在类别分析(LCA)、端元分析(EMA)、概率潜在语义分析(PLSA)和非负矩阵分解(NMF)——之间的相似性。我们聚焦于这些模型的一个核心问题——可识别性,并证明了LBA、EMA、LCA、PLSA的解是唯一的,当且仅当NMF的解是唯一的。我们还对这些模型的算法进行了简要回顾。我们使用一个来自社会科学的时间预算数据集对这些模型进行了示例说明,并在文末讨论了与之密切相关的模型,如原型分析。