用于大规模数据的功能 L-最佳性子抽样 (Functional L-Optimality Subsampling for Massive Data) - 专知论文

会员服务 ·

0

子采样 · 泛函 · 线性模型 · Extensibility · 可约的 ·

2021 年 7 月 6 日

Functional L-Optimality Subsampling for Massive Data

翻译：用于大规模数据的功能 L-最佳性子抽样

Hua Liu,Jinhong You,Jiguo Cao

from arxiv, 37 pages and 15 figures

Massive data bring the big challenges of memory and computation for analysis. These challenges can be tackled by taking subsamples from the full data as a surrogate. For functional data, it is common to collect multiple measurements over their domains, which require even more memory and computation time when the sample size is large. The computation would be much more intensive when statistical inference is required through bootstrap samples. To the best of our knowledge, this article is the first attempt to study the subsampling method for the functional linear model. We propose an optimal subsampling method based on the functional L-optimality criterion. When the response is a discrete or categorical variable, we further extend our proposed functional L-optimality subsampling (FLoS) method to the functional generalized linear model. We establish the asymptotic properties of the estimators by the FLoS method. The finite sample performance of our proposed FLoS method is investigated by extensive simulation studies. The FLoS method is further demonstrated by analyzing two large-scale datasets: the global climate data and the kidney transplant data. The analysis results on these data show that the FLoS method is much better than the uniform subsampling approach and can well approximate the results based on the full data while dramatically reducing the computation time and memory.

翻译：大规模数据带来记忆和计算分析的巨大挑战。这些挑战可以通过将完整数据中的子抽样作为替代数据来应对。对于功能性数据, 通常的做法是收集其域的多重测量数据, 当样本大小较大时需要更多的内存和计算时间。当需要通过靴子取样进行统计推断时, 计算将更加密集。根据我们所知, 本文是首次尝试研究功能线性模型的子抽样方法。我们根据功能性L- 优化标准提出一种最佳的子抽样方法。当回复是离散或绝对变量时, 我们通常会进一步将拟议的功能性L- 最佳性亚抽样方法( FloS) 扩大到功能性通用线性模型。我们用FLOS 方法建立估算器的统计性能。我们拟议的 FLOS 方法的有限抽样性能通过广泛的模拟研究得到调查。 FLOS 方法通过分析两个大型数据集( 全球气候数据和肾脏移植数据是绝对变量变量), 我们的分析结果可以比这些数据更精确地显示, 常规- S 和精确地计算方法的精确地显示这些数据, 。

1

相关内容

子采样

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

专知会员服务

96+阅读 · 2019年12月4日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

计算机类 | PLDI 2020等国际会议信息6条

计算机类 | PLDI 2020等国际会议信息6条

Call4Papers

3+阅读 · 2019年7月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

Hypothesis testing for populations of networks

Hypothesis testing for populations of networks

Arxiv

0+阅读 · 2021年9月8日

Confidence surfaces for the mean of locally stationary functional time series

Confidence surfaces for the mean of locally stationary functional time series

Arxiv

0+阅读 · 2021年9月8日

Functional Principal Subspace Sampling for Large Scale Functional Data Analysis

Arxiv

0+阅读 · 2021年9月8日

Density Estimation by Monte Carlo and Quasi-Monte Carlo

Arxiv

1+阅读 · 2021年9月7日

Adaptive variational Bayes: Optimality, computation and applications

Adaptive variational Bayes: Optimality, computation and applications

Arxiv

0+阅读 · 2021年9月7日

A New Basis for Sparse Principal Component Analysis

A New Basis for Sparse Principal Component Analysis

Arxiv

0+阅读 · 2021年9月7日

Bayesian data selection

Arxiv

0+阅读 · 2021年9月6日

A Unified Approach to Hypothesis Testing for Functional Linear Models

Arxiv

0+阅读 · 2021年9月6日

Selection of Summary Statistics for Network Model Choice with Approximate Bayesian Computation

Arxiv

0+阅读 · 2021年9月6日

An empirical Bayes Approach to stochastic blockmodels and graphons: shrinkage estimation and model selection

Arxiv

0+阅读 · 2021年9月5日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

专知会员服务

96+阅读 · 2019年12月4日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

卫星导航技术发展综述

《美军"僚机"联合能力技术演示项目：有人-无人火炮作战》41页报告

美军条令《火力指挥》116页

可解释的人工智能在生物医学图像分析中的应用综述

相关资讯

计算机类 | PLDI 2020等国际会议信息6条

计算机类 | PLDI 2020等国际会议信息6条

Call4Papers

3+阅读 · 2019年7月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

相关论文

Hypothesis testing for populations of networks

Hypothesis testing for populations of networks

Arxiv

0+阅读 · 2021年9月8日

Confidence surfaces for the mean of locally stationary functional time series

Confidence surfaces for the mean of locally stationary functional time series

Arxiv

0+阅读 · 2021年9月8日

Functional Principal Subspace Sampling for Large Scale Functional Data Analysis

Arxiv

0+阅读 · 2021年9月8日

Density Estimation by Monte Carlo and Quasi-Monte Carlo

Arxiv

1+阅读 · 2021年9月7日

Adaptive variational Bayes: Optimality, computation and applications

Adaptive variational Bayes: Optimality, computation and applications

Arxiv

0+阅读 · 2021年9月7日

A New Basis for Sparse Principal Component Analysis

A New Basis for Sparse Principal Component Analysis

Arxiv

0+阅读 · 2021年9月7日

Bayesian data selection

Arxiv

0+阅读 · 2021年9月6日

A Unified Approach to Hypothesis Testing for Functional Linear Models

Arxiv

0+阅读 · 2021年9月6日

Selection of Summary Statistics for Network Model Choice with Approximate Bayesian Computation

Arxiv

0+阅读 · 2021年9月6日

An empirical Bayes Approach to stochastic blockmodels and graphons: shrinkage estimation and model selection

Arxiv

0+阅读 · 2021年9月5日

微信扫码咨询专知VIP会员