A mixture of multivariate contaminated normal (MCN) distributions is a useful model-based clustering technique to accommodate data sets with mild outliers. However, this model only works when fitted to complete data sets, which is often not the case in real applications. In this paper, we develop a framework for fitting a mixture of MCN distributions to incomplete data sets, i.e. data sets with some values missing at random. We employ the expectation-conditional maximization algorithm for parameter estimation. We use a simulation study to compare the results of our model and a mixture of Student's t distributions for incomplete data.
翻译:多变量受污染正常分布的混合物是一种有用的基于模型的集群技术,可以容纳有轻度离子的数据集。然而,这一模型只有在安装完成数据集时才起作用,而实际应用中往往不是这样。在本文中,我们制定了一个框架,将多氯化萘分布的混合物与不完整的数据集(即带有随机缺失的某些值的数据集)相匹配。我们使用预期条件最大化算法来估计参数。我们用模拟研究来比较模型的结果,并用学生分布的混合物来比较不完整的数据。