Model-based clustering of moderate or large dimensional data is notoriously difficult. We propose a model for simultaneous dimensionality reduction and clustering by assuming a mixture model for a set of latent scores, which are then linked to the observations via a Gaussian latent factor model. This approach was recently investigated by Chandra et al. (2020). The authors use a factor-analytic representation and assume a mixture model for the latent factors. However, performance can deteriorate in the presence of model misspecification. Assuming a repulsive point process prior for the component-specific means of the mixture for the latent scores is shown to yield a more robust model that outperforms the standard mixture model for the latent factors in several simulated scenarios. To favor well-separated clusters of data, the repulsive point process must be anisotropic, and its density should be tractable for efficient posterior inference. We address these issues by proposing a general construction for anisotropic determinantal point processes.
翻译:以模型为基础对中度或大度数据进行群集的中度或大度数据极为困难。 我们提出了一个同时进行维度减少和分组的模式,方法是假设一套潜在分数的混合模型,然后通过高斯潜在系数模型与观测联系起来。 这种方法最近由钱德拉等人(202020年)调查了这一方法。 作者使用系数分析法,并假设潜值因素的混合模型。 但是,在模型有偏差的情况下,性能可能会恶化。 假设对潜在分数的混合物的组合特定方法之前有一个反向点进程, 显示该模型将产生一种较强的模型, 以优于若干模拟情景中潜在因素的标准混合模型。 为了有利于良好的数据组合, 反向点进程必须是厌食性, 其密度应可被引到高效的后部推断中。 我们通过提议对反位决定因素进程进行总体构建来解决这些问题。</s>