多元线性回归解决MICE中的“多变量”问题 (Solving the "many variables" problem in MICE with principal component regression) - 专知论文

会员服务 ·

0

多变量 · 主成分回归 · 线性回归 · 缺失值 · 蒙特卡罗 ·

2023 年 4 月 21 日

Solving the "many variables" problem in MICE with principal component regression

翻译：多元线性回归解决MICE中的“多变量”问题

Edoardo Costantini,Kyle M. Lang,Klaas Sijtsma,Tim Reeskens

Multiple Imputation (MI) is one of the most popular approaches to addressing missing values in questionnaires and surveys. MI with multivariate imputation by chained equations (MICE) allows flexible imputation of many types of data. In MICE, for each variable under imputation, the imputer needs to specify which variables should act as predictors in the imputation model. The selection of these predictors is a difficult, but fundamental, step in the MI procedure, especially when there are many variables in a data set. In this project, we explore the use of principal component regression (PCR) as a univariate imputation method in the MICE algorithm to automatically address the "many variables" problem that arises when imputing large social science data. We compare different implementations of PCR-based MICE with a correlation-thresholding strategy by means of a Monte Carlo simulation study and a case study. We find the use of PCR on a variable-by-variable basis to perform best and that it can perform closely to expertly designed imputation procedures.

翻译：多重插补（MI）是解决问卷和调查中缺失值的最流行方法之一。 MICE通过链式方程的多元插补允许灵活地插补许多类型的数据。在MICE中，对于每个被插补的变量，插补器需要指定哪些变量应充当插补模型中的预测变量。在插补过程中选择这些预测变量是一项困难但基本的步骤，尤其是当数据集中有许多变量时。在本研究中，我们探讨了将主成分回归（PCR）作为MICE算法中的单变量插补方法来自动解决在插补大型社会科学数据时出现的“多变量”问题。通过蒙特卡罗模拟研究和案例研究，我们比较了基于PCR的MICE的不同实现与基于相关阈值策略的方法。我们发现，逐个变量使用PCR可以表现最佳，并且它的表现可以接近专家设计的插补程序。

0

相关内容

多变量

【2022新书】Python数据科学导论，309页pdf

【2022新书】Python数据科学导论，309页pdf

专知会员服务

82+阅读 · 2022年8月6日

哥伦比亚大学最新博士论文《机器学习在金融市场中的应用》Essays on the Applications of Machine Learning in Financial Markets

哥伦比亚大学最新博士论文《机器学习在金融市场中的应用》Essays on the Applications of Machine Learning in Financial Markets

专知会员服务

28+阅读 · 2022年4月8日

【硬核书】矩阵代数基础，248页pdf

【硬核书】矩阵代数基础，248页pdf

专知会员服务

87+阅读 · 2021年12月9日

【数据科学导论书】Introduction to Datascience，253页pdf

【数据科学导论书】Introduction to Datascience，253页pdf

专知会员服务

50+阅读 · 2021年11月15日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

【预测天气】使用深度学习改进天气预报的进展和挑战，60页ppt，Progress and challenges for the use of deep learning to improve weather forecasts，Peter Dueben

【预测天气】使用深度学习改进天气预报的进展和挑战，60页ppt，Progress and challenges for the use of deep learning to improve weather forecasts，Peter Dueben

专知会员服务

55+阅读 · 2020年3月14日

【经典书】C++解决问题第七版，1074pdf，Problem Solving with C++

【经典书】C++解决问题第七版，1074pdf，Problem Solving with C++

专知会员服务

77+阅读 · 2020年2月20日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

一行TensorFlow/Keras代码解决真实场景中数据不平衡(imbalanced)问题

一行TensorFlow/Keras代码解决真实场景中数据不平衡(imbalanced)问题

专知

78+阅读 · 2019年5月31日

独家 | 使用Python实现机器学习特征选择的4种方法（附代码）

独家 | 使用Python实现机器学习特征选择的4种方法（附代码）

数据派THU

12+阅读 · 2019年4月12日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

AI实战圣经《Machine Learning Yearning》第1-52章中英文版pdf分享

AI实战圣经《Machine Learning Yearning》第1-52章中英文版pdf分享

深度学习与NLP

15+阅读 · 2018年9月8日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

24+阅读 · 2017年8月14日

土壤锑砷复合污染对微生物的生态效应及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

Orexin/OX1R激动FOXO1/Atg7干预胰岛β细胞自噬的机制及其在胰岛功能缺陷中的意义

国家自然科学基金

0+阅读 · 2014年12月31日

高维统计模型中的稳健推断及其应用

国家自然科学基金

1+阅读 · 2012年12月31日

高阶多元Markov链及其非负张量模型的理论与数值分析

国家自然科学基金

1+阅读 · 2012年12月31日

半参数回归分析的随机函数法及其高维情形

国家自然科学基金

2+阅读 · 2012年12月31日

仿射技巧在复几何的应用

国家自然科学基金

0+阅读 · 2012年12月31日

可压缩Navier-Stokes方程全局光滑解的适定性问题

国家自然科学基金

0+阅读 · 2012年12月31日

Twin支持向量机的拓广及其应用

国家自然科学基金

0+阅读 · 2011年12月31日

高维问题和稳健性研究

国家自然科学基金

0+阅读 · 2009年12月31日

多尺度高斯过程模型及其学习曲线研究

国家自然科学基金

2+阅读 · 2009年12月31日

Zero-shot Preference Learning for Offline RL via Optimal Transport

Arxiv

0+阅读 · 2023年6月6日

High-dimensional imputation for the social sciences: a comparison of state-of-the-art methods

Arxiv

0+阅读 · 2023年6月6日

Minimum intrinsic dimension scaling for entropic optimal transport

Arxiv

0+阅读 · 2023年6月6日

Solving the 2-MAXSAT Problem in Polynomial Time: A Proof of P = NP

Arxiv

0+阅读 · 2023年6月5日

Comparative analysis of the existence and uniqueness conditions of parameter estimation in paired comparison models

Arxiv

0+阅读 · 2023年6月5日

A general framework for circular local likelihood regression

Arxiv

0+阅读 · 2023年6月5日

Learning to Relate to Previous Turns in Conversational Search

Arxiv

0+阅读 · 2023年6月5日

Comparison of meta-learners for estimating multi-valued treatment heterogeneous effects

Arxiv

0+阅读 · 2023年6月3日

No Bidding, No Regret: Pairwise-Feedback Mechanisms for Digital Goods and Data Auctions

Arxiv

0+阅读 · 2023年6月2日

Lossy Compression of General Random Variables

Arxiv

0+阅读 · 2023年6月2日

VIP会员

文章信息

相关主题

主成分回归

相关VIP内容

【2022新书】Python数据科学导论，309页pdf

【2022新书】Python数据科学导论，309页pdf

专知会员服务

82+阅读 · 2022年8月6日

哥伦比亚大学最新博士论文《机器学习在金融市场中的应用》Essays on the Applications of Machine Learning in Financial Markets

哥伦比亚大学最新博士论文《机器学习在金融市场中的应用》Essays on the Applications of Machine Learning in Financial Markets

专知会员服务

28+阅读 · 2022年4月8日

【硬核书】矩阵代数基础，248页pdf

【硬核书】矩阵代数基础，248页pdf

专知会员服务

87+阅读 · 2021年12月9日

【数据科学导论书】Introduction to Datascience，253页pdf

【数据科学导论书】Introduction to Datascience，253页pdf

专知会员服务

50+阅读 · 2021年11月15日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

【预测天气】使用深度学习改进天气预报的进展和挑战，60页ppt，Progress and challenges for the use of deep learning to improve weather forecasts，Peter Dueben

【预测天气】使用深度学习改进天气预报的进展和挑战，60页ppt，Progress and challenges for the use of deep learning to improve weather forecasts，Peter Dueben

专知会员服务

55+阅读 · 2020年3月14日

【经典书】C++解决问题第七版，1074pdf，Problem Solving with C++

【经典书】C++解决问题第七版，1074pdf，Problem Solving with C++

专知会员服务

77+阅读 · 2020年2月20日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

一行TensorFlow/Keras代码解决真实场景中数据不平衡(imbalanced)问题

一行TensorFlow/Keras代码解决真实场景中数据不平衡(imbalanced)问题

专知

78+阅读 · 2019年5月31日

独家 | 使用Python实现机器学习特征选择的4种方法（附代码）

独家 | 使用Python实现机器学习特征选择的4种方法（附代码）

数据派THU

12+阅读 · 2019年4月12日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

AI实战圣经《Machine Learning Yearning》第1-52章中英文版pdf分享

AI实战圣经《Machine Learning Yearning》第1-52章中英文版pdf分享

深度学习与NLP

15+阅读 · 2018年9月8日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

24+阅读 · 2017年8月14日

相关论文

Zero-shot Preference Learning for Offline RL via Optimal Transport

Arxiv

0+阅读 · 2023年6月6日

High-dimensional imputation for the social sciences: a comparison of state-of-the-art methods

Arxiv

0+阅读 · 2023年6月6日

Minimum intrinsic dimension scaling for entropic optimal transport

Arxiv

0+阅读 · 2023年6月6日

Solving the 2-MAXSAT Problem in Polynomial Time: A Proof of P = NP

Arxiv

0+阅读 · 2023年6月5日

Comparative analysis of the existence and uniqueness conditions of parameter estimation in paired comparison models

Arxiv

0+阅读 · 2023年6月5日

A general framework for circular local likelihood regression

Arxiv

0+阅读 · 2023年6月5日

Learning to Relate to Previous Turns in Conversational Search

Arxiv

0+阅读 · 2023年6月5日

Comparison of meta-learners for estimating multi-valued treatment heterogeneous effects

Arxiv

0+阅读 · 2023年6月3日

No Bidding, No Regret: Pairwise-Feedback Mechanisms for Digital Goods and Data Auctions

Arxiv

0+阅读 · 2023年6月2日

Lossy Compression of General Random Variables

Arxiv

0+阅读 · 2023年6月2日

相关基金

土壤锑砷复合污染对微生物的生态效应及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

Orexin/OX1R激动FOXO1/Atg7干预胰岛β细胞自噬的机制及其在胰岛功能缺陷中的意义

国家自然科学基金

0+阅读 · 2014年12月31日

高维统计模型中的稳健推断及其应用

国家自然科学基金

1+阅读 · 2012年12月31日

高阶多元Markov链及其非负张量模型的理论与数值分析

国家自然科学基金

1+阅读 · 2012年12月31日

半参数回归分析的随机函数法及其高维情形

国家自然科学基金

2+阅读 · 2012年12月31日

仿射技巧在复几何的应用

国家自然科学基金

0+阅读 · 2012年12月31日

可压缩Navier-Stokes方程全局光滑解的适定性问题

国家自然科学基金

0+阅读 · 2012年12月31日

Twin支持向量机的拓广及其应用

国家自然科学基金

0+阅读 · 2011年12月31日

高维问题和稳健性研究

国家自然科学基金

0+阅读 · 2009年12月31日

多尺度高斯过程模型及其学习曲线研究

国家自然科学基金

2+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员