如何量化数据增强的价值？关于缩放定律、不变性和隐式正则化的研究 (How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization) - 专知论文

会员服务 ·

0

数据增强 · 不变性 · 不变 · 隐式正则化 · 正则化 ·

2023 年 3 月 31 日

How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization

翻译：如何量化数据增强的价值？关于缩放定律、不变性和隐式正则化的研究

Jonas Geiping,Micah Goldblum,Gowthami Somepalli,Ravid Shwartz-Ziv,Tom Goldstein,Andrew Gordon Wilson

from arxiv, 31 pages, 29 figures. To be presented at ICLR 2023. Code at https://github.com/JonasGeiping/dataaugs

Despite the clear performance benefits of data augmentations, little is known about why they are so effective. In this paper, we disentangle several key mechanisms through which data augmentations operate. Establishing an exchange rate between augmented and additional real data, we find that in out-of-distribution testing scenarios, augmentations which yield samples that are diverse, but inconsistent with the data distribution can be even more valuable than additional training data. Moreover, we find that data augmentations which encourage invariances can be more valuable than invariance alone, especially on small and medium sized training sets. Following this observation, we show that augmentations induce additional stochasticity during training, effectively flattening the loss landscape.

翻译：尽管数据增强的性能优势明显，但对于其为什么如此有效的原因知之甚少。在本文中，我们分离了数据增强发挥作用的几个关键机制。通过建立增强数据和额外真实数据之间的兑换率，我们发现在离分布外的测试场景中，产生多样性但与数据分布不一致的样本的增强技术可以比额外的训练数据更有价值。此外，我们发现鼓励不变性的数据增强技术比仅仅追求不变性在中小型训练集上更有价值。根据这个观察结果，我们展示了数据增强技术在训练期间引入了额外的随机性，有效地降低了损失函数的复杂度。

0

相关内容

数据增强

数据增强在机器学习领域多指采用一些方法（比如数据蒸馏，正负样本均衡等）来提高模型数据集的质量，增强数据。

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

60+阅读 · 2019年12月24日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

离子注入合成In纳米颗粒在Al薄膜中超导性质的研究

国家自然科学基金

0+阅读 · 2015年12月31日

高维高频数据下金融资产积分波动率矩阵的统计分析

国家自然科学基金

2+阅读 · 2015年12月31日

调和函数水平集的几何性质

国家自然科学基金

0+阅读 · 2013年12月31日

Ghrelin对胰岛β细胞分泌胰岛素和增殖的影响及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

铁基超导体的电子结构实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability

Arxiv

0+阅读 · 2023年5月19日

Less Can Be More: Unsupervised Graph Pruning for Large-scale Dynamic Graphs

Arxiv

0+阅读 · 2023年5月18日

Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks

Arxiv

0+阅读 · 2023年5月17日

A Comparative Study for Unsupervised Network Representation Learning

Arxiv

24+阅读 · 2020年3月11日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

VIP会员

文章信息

相关主题

隐式正则化

相关VIP内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

60+阅读 · 2019年12月24日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

因果强化学习的统一框架：综述、分类体系、算法与应用

《无人机系统 - 反无人机系统：测试方法》364页

【MIT博士论文】语言模型的推理时学习算法

美军低成本无人作战攻击系统（LUCAS）：扩大无人机战争规模

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability

Arxiv

0+阅读 · 2023年5月19日

Less Can Be More: Unsupervised Graph Pruning for Large-scale Dynamic Graphs

Arxiv

0+阅读 · 2023年5月18日

Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks

Arxiv

0+阅读 · 2023年5月17日

A Comparative Study for Unsupervised Network Representation Learning

Arxiv

24+阅读 · 2020年3月11日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

相关基金

离子注入合成In纳米颗粒在Al薄膜中超导性质的研究

国家自然科学基金

0+阅读 · 2015年12月31日

高维高频数据下金融资产积分波动率矩阵的统计分析

国家自然科学基金

2+阅读 · 2015年12月31日

调和函数水平集的几何性质

国家自然科学基金

0+阅读 · 2013年12月31日

Ghrelin对胰岛β细胞分泌胰岛素和增殖的影响及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

铁基超导体的电子结构实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员