破碎的神经放大法 (Broken Neural Scaling Laws) - 专知论文

会员服务 ·

0

缩放 · 泛函 · Learning · 情景 · MoDELS ·

2023 年 1 月 23 日

Broken Neural Scaling Laws

翻译：破碎的神经放大法

Ethan Caballero,Kshitij Gupta,Irina Rish,David Krueger

We present a smoothly broken power law functional form that accurately models and extrapolates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as the amount of compute used for training, number of model parameters, training dataset size, or upstream performance varies) for various architectures and for each of various tasks within a large and diverse set of upstream and downstream tasks, in zero-shot, prompted, and fine-tuned settings. This set includes large-scale vision, language, audio, video, diffusion generative modeling, multimodal learning, contrastive learning, AI alignment, robotics, arithmetic, unsupervised/self-supervised learning, and reinforcement learning (single agent and multi-agent). When compared to other functional forms for neural scaling behavior, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set. Moreover, this functional form accurately models and extrapolates scaling behavior that other functional forms are incapable of expressing such as the non-monotonic transitions present in the scaling behavior of phenomena such as double descent and the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic. Lastly, we use this functional form to glean insights about the limit of the predictability of scaling behavior. Code is available at https://github.com/ethancaballero/broken_neural_scaling_laws

翻译：我们展示了一种顺利破碎的权力法功能形式,它精确地模型和外推了深神经网络的缩放行为(即,由于用于培训的计算数量、模型参数数量、培训数据集大小或上游绩效的不同,评估利息的衡量尺度也因培训的计算数量、模型参数数量、培训数据集大小或上游绩效的不同而不同),对于各种结构以及大型和多样化的上游和下游任务中的每一项任务,以零射、促动和微调的设置的形式,呈现出一种顺利的分级行为。这套功能形式包括大规模视觉、语言、音频、视频、传播基因建模、多式联运学习、对比学习、AI调整、机器人、算术、不受监督/自我监督的学习以及强化学习(单一剂和多剂)的量度衡量标准。与神经缩放行为的其他功能形式相比,这种功能形式产生了比尺度行为外推法,而在这个设置上则相当准确得多。此外,这种功能形式模型和外推法行为,即其他功能形式无法表达诸如双位下降和延迟、直视-直视-直视-直视-直观等现象等现象。

0

相关内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

48+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

59+阅读 · 2022年4月22日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

25+阅读 · 2021年4月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

168+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

77+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

99+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

1+阅读 · 2022年11月2日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

蓖麻矮化相关RcDof基因功能分析及调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Al-Cr-Si系中十次准晶体原位三维晶体结构的电子断层成像三维重构

国家自然科学基金

0+阅读 · 2014年12月31日

调和稳定-GARCH模型下期权定价和风险管理研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于原位溶胶-凝胶法的木材-有机-无机杂化纳米复合材料可控制备规律与界面结合机制

国家自然科学基金

0+阅读 · 2013年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

典型生物耐磨、止裂刚柔耦合表面的多尺度力学机制

国家自然科学基金

0+阅读 · 2012年12月31日

有机给体-受体共晶的双极性输运性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型炭/半导体纳米复合结构材料的可控制备与性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

hMSCs定向汗腺细胞分化中TRAF6信号复合物活化不同NF-κB通路的机制

国家自然科学基金

0+阅读 · 2011年12月31日

排队模型与可靠性模型的时间依赖解的结构研究

国家自然科学基金

0+阅读 · 2008年12月31日

AdaptiveNet: Post-deployment Neural Architecture Adaptation for Diverse Edge Environments

Arxiv

0+阅读 · 2023年3月13日

Measurement-Consistent Networks via a Deep Implicit Layer for Solving Inverse Problems

Arxiv

0+阅读 · 2023年3月13日

Policy-Guided Lazy Search with Feedback for Task and Motion Planning

Arxiv

0+阅读 · 2023年3月11日

Just Flip: Flipped Observation Generation and Optimization for Neural Radiance Fields to Cover Unobserved View

Arxiv

0+阅读 · 2023年3月11日

Moving Fast With Broken Data

Arxiv

0+阅读 · 2023年3月10日

Zero-One Laws of Graph Neural Networks

Arxiv

0+阅读 · 2023年3月10日

On Neural Differential Equations

Arxiv

22+阅读 · 2022年2月4日

Updating Embeddings for Dynamic Knowledge Graphs

Arxiv

20+阅读 · 2021年9月22日

A Wholistic View of Continual Learning with Deep Neural Networks: Forgotten Lessons and the Bridge to Active and Open World Learning

Arxiv

35+阅读 · 2020年9月3日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

48+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

59+阅读 · 2022年4月22日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

25+阅读 · 2021年4月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

168+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

77+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

99+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

热门VIP内容

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

1+阅读 · 2022年11月2日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

AdaptiveNet: Post-deployment Neural Architecture Adaptation for Diverse Edge Environments

Arxiv

0+阅读 · 2023年3月13日

Measurement-Consistent Networks via a Deep Implicit Layer for Solving Inverse Problems

Arxiv

0+阅读 · 2023年3月13日

Policy-Guided Lazy Search with Feedback for Task and Motion Planning

Arxiv

0+阅读 · 2023年3月11日

Just Flip: Flipped Observation Generation and Optimization for Neural Radiance Fields to Cover Unobserved View

Arxiv

0+阅读 · 2023年3月11日

Moving Fast With Broken Data

Arxiv

0+阅读 · 2023年3月10日

Zero-One Laws of Graph Neural Networks

Arxiv

0+阅读 · 2023年3月10日

On Neural Differential Equations

Arxiv

22+阅读 · 2022年2月4日

Updating Embeddings for Dynamic Knowledge Graphs

Arxiv

20+阅读 · 2021年9月22日

A Wholistic View of Continual Learning with Deep Neural Networks: Forgotten Lessons and the Bridge to Active and Open World Learning

Arxiv

35+阅读 · 2020年9月3日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

相关基金

蓖麻矮化相关RcDof基因功能分析及调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Al-Cr-Si系中十次准晶体原位三维晶体结构的电子断层成像三维重构

国家自然科学基金

0+阅读 · 2014年12月31日

调和稳定-GARCH模型下期权定价和风险管理研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于原位溶胶-凝胶法的木材-有机-无机杂化纳米复合材料可控制备规律与界面结合机制

国家自然科学基金

0+阅读 · 2013年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

典型生物耐磨、止裂刚柔耦合表面的多尺度力学机制

国家自然科学基金

0+阅读 · 2012年12月31日

有机给体-受体共晶的双极性输运性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型炭/半导体纳米复合结构材料的可控制备与性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

hMSCs定向汗腺细胞分化中TRAF6信号复合物活化不同NF-κB通路的机制

国家自然科学基金

0+阅读 · 2011年12月31日

排队模型与可靠性模型的时间依赖解的结构研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员