具有高维度低兰克数据近似交叉估价 (Approximate Cross-Validation with Low-Rank Data in High Dimensions) - 专知论文

会员服务 ·

0

近似 · 模型评估 · 秩 · FAST · 黑塞矩阵 ·

2022 年 11 月 1 日

Approximate Cross-Validation with Low-Rank Data in High Dimensions

翻译：具有高维度低兰克数据近似交叉估价

William T. Stephenson,Madeleine Udell,Tamara Broderick

from arxiv, Published in NeurIPS 2020

Many recent advances in machine learning are driven by a challenging trifecta: large data size $N$; high dimensions; and expensive algorithms. In this setting, cross-validation (CV) serves as an important tool for model assessment. Recent advances in approximate cross validation (ACV) provide accurate approximations to CV with only a single model fit, avoiding traditional CV's requirement for repeated runs of expensive algorithms. Unfortunately, these ACV methods can lose both speed and accuracy in high dimensions -- unless sparsity structure is present in the data. Fortunately, there is an alternative type of simplifying structure that is present in most data: approximate low rank (ALR). Guided by this observation, we develop a new algorithm for ACV that is fast and accurate in the presence of ALR data. Our first key insight is that the Hessian matrix -- whose inverse forms the computational bottleneck of existing ACV methods -- is ALR. We show that, despite our use of the \emph{inverse} Hessian, a low-rank approximation using the largest (rather than the smallest) matrix eigenvalues enables fast, reliable ACV. Our second key insight is that, in the presence of ALR data, error in existing ACV methods roughly grows with the (approximate, low) rank rather than with the (full, high) dimension. These insights allow us to prove theoretical guarantees on the quality of our proposed algorithm -- along with fast-to-compute upper bounds on its error. We demonstrate the speed and accuracy of our method, as well as the usefulness of our bounds, on a range of real and simulated data sets.

翻译：机器学习方面的许多最近进展都是由具有挑战性的三维影响驱动的:数据规模巨大;尺寸高;算法昂贵。在这一背景下,交叉校准(CV)是模型评估的一个重要工具。最近近似交叉校验(ACV)的进展为CV提供了准确的近似近似值,只有单一的模型才适合,避免了传统的CV对重复运行昂贵算法的要求。不幸的是,这些ACV方法在高维方面可能会失去速度和准确性 -- 除非数据中存在松缩结构。幸运的是,大多数数据中存在一种简化结构的替代类型:接近低级(ALR) 。根据这一观察,我们为ACV开发了一种新的算法,在ALR数据存在时速度和准确。我们的第一个关键直径矩阵中,赫萨基矩阵的反向构成现有AC方法的计算瓶颈。尽管我们使用了累增级结构,但Hesian(我们使用低级结构的精度结构)的低端近端近端点(比最小的精度)更精确的近端点(比最小的直径) 显示我们目前的关键直径直径的精确的AVI数据。

0

相关内容

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【普林斯顿经典书】高维概率，326页pdf，Probability in High Dimension

【普林斯顿经典书】高维概率，326页pdf，Probability in High Dimension

专知会员服务

107+阅读 · 2021年2月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

124+阅读 · 2020年5月30日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

微纳结构钽基异质复合阵列的构筑、界面调控及光电化学性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

氧化石墨烯基复合物的合成及在放射性废水处理中的吸附性能

国家自然科学基金

0+阅读 · 2013年12月31日

几类Pfaffian图的结构性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

二维单晶纳米结构及其非易失性铁电随机存储器机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于图上随机游走、渗流的几个问题

国家自然科学基金

0+阅读 · 2012年12月31日

图在曲面上嵌入的分类

国家自然科学基金

0+阅读 · 2011年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

1+阅读 · 2011年12月31日

甲状腺激素与生长激素协调海马神经元生长发育的机制

国家自然科学基金

0+阅读 · 2008年12月31日

GmMADS1在大豆花发育中的调控机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

A-NeSI: A Scalable Approximate Method for Probabilistic Neurosymbolic Inference

Arxiv

0+阅读 · 2022年12月23日

Two-Sample Test for High-Dimensional Covariance Matrices: a normal-reference approach

Arxiv

0+阅读 · 2022年12月23日

Confounding-adjustment methods for the difference in medians

Arxiv

0+阅读 · 2022年12月22日

A localized reduced basis approach for unfitted domain methods on parameterized geometries

Arxiv

0+阅读 · 2022年12月22日

Potential Singularity of the Axisymmetric Euler Equations with $C^α$ Initial Vorticity for A Large Range of $α$. Part II: the $N$-Dimensional Case

Potential Singularity of the Axisymmetric Euler Equations with $C^α$ Initial Vorticity for A Large Range of $α$. Part II: the $N$-Dimensional Case

Arxiv

0+阅读 · 2022年12月22日

Multiple Imputation with Neural Network Gaussian Process for High-dimensional Incomplete Data

Arxiv

0+阅读 · 2022年12月21日

Generating music with sentiment using Transformer-GANs

Arxiv

0+阅读 · 2022年12月21日

Riccati-feedback Control of a Two-dimensional Two-phase Stefan Problem

Arxiv

0+阅读 · 2022年12月21日

Efficient Nonparametric Estimation of Incremental Propensity Score Effects with Clustered Interference

Arxiv

0+阅读 · 2022年12月21日

Handling missing data when estimating causal effects with Targeted Maximum Likelihood Estimation

Arxiv

0+阅读 · 2022年12月21日

VIP会员

文章信息

相关主题

相关VIP内容

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【普林斯顿经典书】高维概率，326页pdf，Probability in High Dimension

【普林斯顿经典书】高维概率，326页pdf，Probability in High Dimension

专知会员服务

107+阅读 · 2021年2月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

124+阅读 · 2020年5月30日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《代码、指挥与冲突：描绘军事人工智能的未来》报告

【斯坦福博士论文】面向地理空间数据的多模态与多尺度建模：时空生成式人工智能

美国启动“自有军事人工智能计划”：采用谷歌Gemini以推动全军人工智能应用

《创新与适应性作为军事成功的关键因素：来自俄乌战争的战略洞见》报告

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

A-NeSI: A Scalable Approximate Method for Probabilistic Neurosymbolic Inference

Arxiv

0+阅读 · 2022年12月23日

Two-Sample Test for High-Dimensional Covariance Matrices: a normal-reference approach

Arxiv

0+阅读 · 2022年12月23日

Confounding-adjustment methods for the difference in medians

Arxiv

0+阅读 · 2022年12月22日

A localized reduced basis approach for unfitted domain methods on parameterized geometries

Arxiv

0+阅读 · 2022年12月22日

Potential Singularity of the Axisymmetric Euler Equations with $C^α$ Initial Vorticity for A Large Range of $α$. Part II: the $N$-Dimensional Case

Potential Singularity of the Axisymmetric Euler Equations with $C^α$ Initial Vorticity for A Large Range of $α$. Part II: the $N$-Dimensional Case

Arxiv

0+阅读 · 2022年12月22日

Multiple Imputation with Neural Network Gaussian Process for High-dimensional Incomplete Data

Arxiv

0+阅读 · 2022年12月21日

Generating music with sentiment using Transformer-GANs

Arxiv

0+阅读 · 2022年12月21日

Riccati-feedback Control of a Two-dimensional Two-phase Stefan Problem

Arxiv

0+阅读 · 2022年12月21日

Efficient Nonparametric Estimation of Incremental Propensity Score Effects with Clustered Interference

Arxiv

0+阅读 · 2022年12月21日

Handling missing data when estimating causal effects with Targeted Maximum Likelihood Estimation

Arxiv

0+阅读 · 2022年12月21日

相关基金

微纳结构钽基异质复合阵列的构筑、界面调控及光电化学性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

氧化石墨烯基复合物的合成及在放射性废水处理中的吸附性能

国家自然科学基金

0+阅读 · 2013年12月31日

几类Pfaffian图的结构性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

二维单晶纳米结构及其非易失性铁电随机存储器机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于图上随机游走、渗流的几个问题

国家自然科学基金

0+阅读 · 2012年12月31日

图在曲面上嵌入的分类

国家自然科学基金

0+阅读 · 2011年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

1+阅读 · 2011年12月31日

甲状腺激素与生长激素协调海马神经元生长发育的机制

国家自然科学基金

0+阅读 · 2008年12月31日

GmMADS1在大豆花发育中的调控机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员