HyperMixer: An MLP-based Low Cost Alternative to Transformers - 专知论文

会员服务 ·

0

代价 · SimPLe · MoDELS · tuning · 变换 ·

2023 年 5 月 25 日

HyperMixer: An MLP-based Low Cost Alternative to Transformers

翻译：暂无翻译

Florian Mai,Arnaud Pannatier,Fabio Fehr,Haolin Chen,Francois Marelli,Francois Fleuret,James Henderson

from arxiv, Published at ACL 2023

Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding. In this paper, we propose a simple variant, HyperMixer, which forms the token mixing MLP dynamically using hypernetworks. Empirically, we demonstrate that our model performs better than alternative MLP-based models, and on par with Transformers. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyperparameter tuning.

翻译：暂无翻译

0

相关内容

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

CyPA/CD147信号通路在蛛网膜下腔出血后早期脑损伤中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

秦苓液调控AMPK信号系统抑制尿酸性肾病免疫代谢炎性损伤的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

高热稳定性的纳米空心结构钛基催化剂的制备及其NH3选择性催化还原NOx性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

针尖石墨烯纳米场效应晶体管生物传感器的研究

国家自然科学基金

0+阅读 · 2012年12月31日

载脂蛋白A-I半胱氨酸突变体重组高密度脂蛋白抗炎机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models

Arxiv

0+阅读 · 2023年7月13日

Layered controller synthesis for dynamic multi-agent systems

Arxiv

0+阅读 · 2023年7月13日

What Happens During Finetuning of Vision Transformers: An Invariance Based Investigation

Arxiv

0+阅读 · 2023年7月12日

Hyper-parameter Tuning for Adversarially Robust Models

Arxiv

0+阅读 · 2023年7月11日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

VIP会员

文章信息

相关主题

相关VIP内容

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

从代码基础模型到智能体与应用：代码智能的全面综述与实践指南

《北约认知战概念报告》

【MIT博士论文】高效的视觉合成生成模型

美海军放弃星座级转而采用国家安全巡逻舰设计

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models

Arxiv

0+阅读 · 2023年7月13日

Layered controller synthesis for dynamic multi-agent systems

Arxiv

0+阅读 · 2023年7月13日

What Happens During Finetuning of Vision Transformers: An Invariance Based Investigation

Arxiv

0+阅读 · 2023年7月12日

Hyper-parameter Tuning for Adversarially Robust Models

Arxiv

0+阅读 · 2023年7月11日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

相关基金

CyPA/CD147信号通路在蛛网膜下腔出血后早期脑损伤中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

秦苓液调控AMPK信号系统抑制尿酸性肾病免疫代谢炎性损伤的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

高热稳定性的纳米空心结构钛基催化剂的制备及其NH3选择性催化还原NOx性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

针尖石墨烯纳米场效应晶体管生物传感器的研究

国家自然科学基金

0+阅读 · 2012年12月31日

载脂蛋白A-I半胱氨酸突变体重组高密度脂蛋白抗炎机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员