随机初始化效果超过机会和如何找到机会 (Random initialisations performing above chance and how to find them) - 专知论文

会员服务 ·

0

SGD · Performer · Neural Networks · 置换不变性 · Networks ·

2022 年 9 月 15 日

Random initialisations performing above chance and how to find them

翻译：随机初始化效果超过机会和如何找到机会

Frederik Benzing,Simon Schug,Robert Meier,Johannes von Oswald,Yassir Akram,Nicolas Zucchet,Laurence Aitchison,Angelika Steger

Neural networks trained with stochastic gradient descent (SGD) starting from different random initialisations typically find functionally very similar solutions, raising the question of whether there are meaningful differences between different SGD solutions. Entezari et al. recently conjectured that despite different initialisations, the solutions found by SGD lie in the same loss valley after taking into account the permutation invariance of neural networks. Concretely, they hypothesise that any two solutions found by SGD can be permuted such that the linear interpolation between their parameters forms a path without significant increases in loss. Here, we use a simple but powerful algorithm to find such permutations that allows us to obtain direct empirical evidence that the hypothesis is true in fully connected networks. Strikingly, we find that two networks already live in the same loss valley at the time of initialisation and averaging their random, but suitably permuted initialisation performs significantly above chance. In contrast, for convolutional architectures, our evidence suggests that the hypothesis does not hold. Especially in a large learning rate regime, SGD seems to discover diverse modes.

翻译：Entezari等人最近推测,尽管初始化不同,但SGD发现的解决办法在考虑神经网络的变异性之后,位于同一个损失谷。具体地说,他们假设SGD发现的任何两种解决办法都可以被变换,以便其参数之间的线性内插形成一条路径,而没有显著增加损失。在这里,我们使用简单而有力的算法找到这种变相,使我们能够直接获得经验证据,证明这种假设在完全连接的网络中是真实的。奇怪的是,我们发现两个网络在初始化时已经生活在同一损失谷中,并且平均其随机性,但相近的变异初始化却比机会大得多。相反,对于革命性结构,我们的证据表明,这种假设没有形成一条路径,没有显著增加损失。特别是在一个大型学习率制度中,SGD似乎发现了不同的模式。

0

相关内容

SGD

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

69+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

72+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

75+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

77+阅读 · 2020年7月26日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

14+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

47+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

81+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

103+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

25+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

27+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

图像分割中若干图论问题的研究

国家自然科学基金

0+阅读 · 2014年12月31日

CK2诱导的PTEN蛋白Ser380/Thr382/383位点磷酸化激活PI3K/Akt信号通路在胃癌形成、生长及转移中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

Resveratrol联合MSCs移植对阿尔茨海默鼠的干预效果及Sirt1分子信号的介导作用

国家自然科学基金

0+阅读 · 2014年12月31日

Perp在类风湿性关节炎外周Th17细胞存活中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

活化的PLC-γ及与Akt关联调控OA软骨基质代谢的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

TREM-1/DAP12/ NF-κB信号通路在6-姜烯酚抗动脉粥样硬化中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Heusler铁磁合金中的磁畴动态特性和马氏体结构相变研究

国家自然科学基金

0+阅读 · 2012年12月31日

图的双临猜想及相关的着色问题

国家自然科学基金

0+阅读 · 2011年12月31日

Klotho蛋白在缺血再灌注急性肾损伤中的抗氧化应激机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

探寻与高功能孤独症和Asperger综合征相关的拷贝数变异

国家自然科学基金

0+阅读 · 2009年12月31日

Enabling Quantum Speedup of Markov Chains using a Multi-level Approach

Enabling Quantum Speedup of Markov Chains using a Multi-level Approach

Arxiv

0+阅读 · 2022年10月25日

Stochastic Features of Purified Time Series

Arxiv

0+阅读 · 2022年10月24日

LawBreaker: An Approach for Specifying Traffic Laws and Fuzzing Autonomous Vehicles

Arxiv

0+阅读 · 2022年10月24日

Learning from Few Samples: Transformation-Invariant SVMs with Composition and Locality at Multiple Scales

Arxiv

0+阅读 · 2022年10月22日

Assaying Out-Of-Distribution Generalization in Transfer Learning

Arxiv

0+阅读 · 2022年10月21日

Error-Covariance Analysis of Monocular Pose Estimation Using Total Least Squares

Arxiv

0+阅读 · 2022年10月21日

ExSum: From Local Explanations to Model Understanding

Arxiv

12+阅读 · 2022年4月30日

Powerful Graph Convolutioal Networks with Adaptive Propagation Mechanism for Homophily and Heterophily

Arxiv

20+阅读 · 2021年12月27日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

How Powerful are Graph Neural Networks?

Arxiv

23+阅读 · 2018年10月1日

VIP会员

文章信息

相关主题

Neural Networks

置换不变性

相关VIP内容

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

69+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

72+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

75+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

77+阅读 · 2020年7月26日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

14+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

47+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

81+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

103+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

热门VIP内容

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

25+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

27+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

相关论文

Enabling Quantum Speedup of Markov Chains using a Multi-level Approach

Enabling Quantum Speedup of Markov Chains using a Multi-level Approach

Arxiv

0+阅读 · 2022年10月25日

Stochastic Features of Purified Time Series

Arxiv

0+阅读 · 2022年10月24日

LawBreaker: An Approach for Specifying Traffic Laws and Fuzzing Autonomous Vehicles

Arxiv

0+阅读 · 2022年10月24日

Learning from Few Samples: Transformation-Invariant SVMs with Composition and Locality at Multiple Scales

Arxiv

0+阅读 · 2022年10月22日

Assaying Out-Of-Distribution Generalization in Transfer Learning

Arxiv

0+阅读 · 2022年10月21日

Error-Covariance Analysis of Monocular Pose Estimation Using Total Least Squares

Arxiv

0+阅读 · 2022年10月21日

ExSum: From Local Explanations to Model Understanding

Arxiv

12+阅读 · 2022年4月30日

Powerful Graph Convolutioal Networks with Adaptive Propagation Mechanism for Homophily and Heterophily

Arxiv

20+阅读 · 2021年12月27日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

How Powerful are Graph Neural Networks?

Arxiv

23+阅读 · 2018年10月1日

相关基金

图像分割中若干图论问题的研究

国家自然科学基金

0+阅读 · 2014年12月31日

CK2诱导的PTEN蛋白Ser380/Thr382/383位点磷酸化激活PI3K/Akt信号通路在胃癌形成、生长及转移中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

Resveratrol联合MSCs移植对阿尔茨海默鼠的干预效果及Sirt1分子信号的介导作用

国家自然科学基金

0+阅读 · 2014年12月31日

Perp在类风湿性关节炎外周Th17细胞存活中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

活化的PLC-γ及与Akt关联调控OA软骨基质代谢的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

TREM-1/DAP12/ NF-κB信号通路在6-姜烯酚抗动脉粥样硬化中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Heusler铁磁合金中的磁畴动态特性和马氏体结构相变研究

国家自然科学基金

0+阅读 · 2012年12月31日

图的双临猜想及相关的着色问题

国家自然科学基金

0+阅读 · 2011年12月31日

Klotho蛋白在缺血再灌注急性肾损伤中的抗氧化应激机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

探寻与高功能孤独症和Asperger综合征相关的拷贝数变异

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员