Multivariate time series with missing values are common in many areas, for instance in healthcare and finance. To face this problem, modern data imputation approaches should (a) be tailored to sequential data, (b) deal with high dimensional and complex data distributions, and (c) be based on the probabilistic modeling paradigm for interpretability and confidence assessment. However, many current approaches fall short in at least one of these aspects. Drawing on advances in deep learning and scalable probabilistic modeling, we propose a new deep sequential variational autoencoder approach for dimensionality reduction and data imputation. Temporal dependencies are modeled with a Gaussian process prior and a Cauchy kernel to reflect multi-scale dynamics in the latent space. We furthermore use a structured variational inference distribution that improves the scalability of the approach. We demonstrate that our model exhibits superior imputation performance on benchmark tasks and challenging real-world medical data.
翻译:面对这一问题,现代数据估算方法应该(a) 适应相继数据,(b) 处理高维和复杂的数据分布,(c) 以可解释性和信任评估的概率模型模式为基础,但许多当前方法至少在其中一个方面不尽相同。根据深层次学习和可伸缩的概率建模的进展,我们建议采用新的深层次相继自变自动校对法,以减少维度和数据估算。时间依赖性先以高频进程为模型,后以宽度内核为模型,以反映潜在空间的多尺度动态。我们进一步使用结构化变异的推论分布,提高这种方法的可伸缩性。我们证明,我们的模型在基准任务和挑战现实世界的医疗数据方面表现出较高的推算性。