Latent variable models can be used to probabilistically "fill-in" missing data entries. The variational autoencoder architecture (Kingma and Welling, 2014; Rezende et al., 2014) includes a "recognition" or "encoder" network that infers the latent variables given the data variables. However, it is not clear how to handle missing data variables in this network. The factor analysis (FA) model is a basic autoencoder, using linear encoder and decoder networks. We show how to calculate exactly the latent posterior distribution for the factor analysis (FA) model in the presence of missing data, and note that this solution implies that a different encoder network is required for each pattern of missingness. We also discuss various approximations to the exact solution. Experiments compare the effectiveness of various approaches to filling in the missing data.
翻译:隐性变量模型可用于概率性地“ 填充” 缺失的数据条目。 变式自动编码器结构( Kingma and Welling, 2014; Rezende et al., 2014) 包含一个“ 识别” 或“ 编码器” 网络, 以推断数据变量中的潜在变量。 但是, 不清楚如何处理此网络中缺失的数据变量。 系数分析模型是一个基本的自动编码器, 使用线性编码器和解码器网络。 我们展示了在缺少数据的情况下如何精确计算系数分析模型的潜在后端分布, 并指出, 此解决方案意味着每个缺失模式都需要不同的编码器网络。 我们还讨论与确切解决方案的不同近似。 实验比较了填补缺失数据的各种方法的有效性 。