用于混合数据类型估算的非现即快速高斯立方立方公尺模型 (Asymptotically Exact and Fast Gaussian Copula Models for Imputation of Mixed Data Types)

Missing values with mixed data types is a common problem in a large number of machine learning applications such as processing of surveys and in different medical applications. Recently, Gaussian copula models have been suggested as a means of performing imputation of missing values using a probabilistic framework. While the present Gaussian copula models have shown to yield state of the art performance, they have two limitations: they are based on an approximation that is fast but may be imprecise and they do not support unordered multinomial variables. We address the first limitation using direct and arbitrarily precise approximations both for model estimation and imputation by using randomized quasi-Monte Carlo procedures. The method we provide has lower errors for the estimated model parameters and the imputed values, compared to previously proposed methods. We also extend the previous Gaussian copula models to include unordered multinomial variables in addition to the present support of ordinal, binary, and continuous variables.

翻译：缺少的数值与混合数据类型是大量机器学习应用中常见的问题,例如调查处理和不同的医疗应用中常见的问题。最近,Gaussian Conula模型被建议为一种使用概率框架对缺失值进行估算的手段。虽然当前的Gaussian Colula模型显示能够产生最新性能,但它们有两个局限性:它们基于快速的近似值,但可能不精确,而且不支持未经排序的多名变量。我们通过随机化准蒙太罗程序,在模型估计和估算方面直接和任意精确的近似值。我们提供的方法比先前提议的方法对估计模型参数和估算值的错误要小。我们还扩大了以前的Gaussian Coupula模型,在目前对正态、二进制和连续变量的支持之外,还包括未排序的多名变量。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日