Bayesian分配模式:使用Polica Urns的无负式火苗因子化和专题模型按顺序排列的Monte Carlo的推论 (Bayesian Allocation Model: Inference by Sequential Monte Carlo for Nonnegative Tensor Factorizations and Topic Models using Polya Urns)

We introduce a dynamic generative model, Bayesian allocation model (BAM), which establishes explicit connections between nonnegative tensor factorization (NTF), graphical models of discrete probability distributions and their Bayesian extensions, and the topic models such as the latent Dirichlet allocation. BAM is based on a Poisson process, whose events are marked by using a Bayesian network, where the conditional probability tables of this network are then integrated out analytically. We show that the resulting marginal process turns out to be a Polya urn, an integer valued self-reinforcing process. This urn processes, which we name a Polya-Bayes process, obey certain conditional independence properties that provide further insight about the nature of NTF. These insights also let us develop space efficient simulation algorithms that respect the potential sparsity of data: we propose a class of sequential importance sampling algorithms for computing NTF and approximating their marginal likelihood, which would be useful for model selection. The resulting methods can also be viewed as a model scoring method for topic models and discrete Bayesian networks with hidden variables. The new algorithms have favourable properties in the sparse data regime when contrasted with variational algorithms that become more accurate when the total sum of the elements of the observed tensor goes to infinity. We illustrate the performance on several examples and numerically study the behaviour of the algorithms for various data regimes.

翻译：我们引入了动态基因模型,即巴耶斯分配模型(BAM),该模型在非消极的发价因素化(NTF),离散概率分布及其巴伊西扩展的图形模型和诸如潜伏的Drichlet分配等主题模型之间建立了明确的联系。巴伊森进程基于一个Poisson进程,其事件以使用巴伊西亚网络为标志,然后将这个网络的有条件概率表纳入分析。我们表明,由此产生的边际进程最终变成了一个聚氨酯机制,一个具有全值的自我强化进程。我们命名一个聚氨进程,遵守某些有条件的独立特性,从而进一步洞察出NTF的性质。这些洞见还让我们开发一个尊重潜在数据宽度的空间高效模拟算法:我们提出了一组序列重要性抽样算法,用于计算NTFTF,并适应其边际可能性,这将有利于模式选择。由此形成的方法也可以被视为主题模型和离散的Bayesian网络的模型评分数方法,并带有隐藏的变量。当各种数据算法的数值变异时,新的算法在数据变数中具有有利的特性。