Conformer实践：针对设备和云端ASR进行Conformer大小、速度和flops的优化 (Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR) - 专知论文

会员服务 ·

0

Conformer · 语音识别 · 推断 · 级联 · Attention ·

2023 年 3 月 31 日

Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR

翻译：Conformer实践：针对设备和云端ASR进行Conformer大小、速度和flops的优化

Rami Botros,Anmol Gulati,Tara N. Sainath,Krzysztof Choromanski,Ruoming Pang,Trevor Strohman,Weiran Wang,Jiahui Yu

Conformer models maintain a large number of internal states, the vast majority of which are associated with self-attention layers. With limited memory bandwidth, reading these from memory at each inference step can slow down inference. In this paper, we design an optimized conformer that is small enough to meet on-device restrictions and has fast inference on TPUs. We explore various ideas to improve the execution speed, including replacing lower conformer blocks with convolution-only blocks, strategically downsizing the architecture, and utilizing an RNNAttention-Performer. Our optimized conformer can be readily incorporated into a cascaded-encoder setting, allowing a second-pass decoder to operate on its output and improve the accuracy whenever more resources are available. Altogether, we find that these optimizations can reduce latency by a factor of 6.8x, and come at a reasonable trade-off in quality. With the cascaded second-pass, we show that the recognition accuracy is completely recoverable. Thus, our proposed encoder can double as a strong standalone encoder in on device, and as the first part of a high-performance ASR pipeline.

翻译：Conformer模型保留大量的内部状态，其中绝大部分与self-attention层相关。在有限的内存带宽下，每个推断步骤从内存中读取这些状态可能会减慢推断速度。本文设计了一种优化的Conformer，其大小足以满足设备要求并且TPU上具有快速的推断速度。我们探讨了多种方法来提高执行速度，包括用仅包含卷积的块替换Conformer底部的块、战略性地缩小架构以及利用RNN-Attention-Performer。我们优化的Conformer可以轻松地组合进级联编码器设置中，允许第二遍解码器对其输出进行操作，并在更多资源可用时提高准确性。总体而言，我们发现这些优化可以将延迟减少6.8倍，并且在质量和妥协方面具有合理的平衡。使用级联第二遍，我们展示了识别精度完全可恢复。因此，我们提出的编码器可以作为独立的强大编码器在设备上运行，也可以作为高性能ASR管道的第一部分。

0

相关内容

Conformer

【ICML2022】Branchformer:并行MLP-Attention架构，捕捉局部和全局上下文，用于语音识别和理解

【ICML2022】Branchformer:并行MLP-Attention架构，捕捉局部和全局上下文，用于语音识别和理解

专知会员服务

25+阅读 · 2022年7月8日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

92+阅读 · 2020年2月12日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

机器之心

0+阅读 · 2022年10月7日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

机器学习研究会

11+阅读 · 2018年1月14日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

使用GPU加速银道面尘埃辐射图像的高分辨率模拟与多参数反演

国家自然科学基金

0+阅读 · 2015年12月31日

以外泌体介导的microRNA体内传输机制为基础开发新型的骨肉瘤治疗方式

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Pictet–Spengler类反应机理的理论研究和新反应设计

国家自然科学基金

0+阅读 · 2013年12月31日

针对FPGA协处理器的高速布局布线算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

ESD术中用创面修复温敏凝胶的设计及其作用机理

国家自然科学基金

0+阅读 · 2012年12月31日

面向嵌入式系统的TLC NAND闪存存储系统优化技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

全球叶面积指数遥感产品在中国水稻区的不确定性评价与改进方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

Lorenz-like系统族的等价性和混沌吸引子几何结构

国家自然科学基金

0+阅读 · 2011年12月31日

Erbin在细胞分裂周期中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

ProSpect: Expanded Conditioning for the Personalization of Attribute-aware Image Generation

Arxiv

0+阅读 · 2023年5月25日

Minimizing Trajectory Curvature of ODE-based Generative Models

Arxiv

0+阅读 · 2023年5月25日

Non-Asymptotic Lower Bounds For Training Data Reconstruction

Arxiv

0+阅读 · 2023年5月24日

Streaming Parrotron for on-device speech-to-speech conversion

Arxiv

0+阅读 · 2023年5月24日

A Neural Space-Time Representation for Text-to-Image Personalization

Arxiv

0+阅读 · 2023年5月24日

Another Dead End for Morphological Tags? Perturbed Inputs and Parsing

Arxiv

0+阅读 · 2023年5月24日

CMOT: Cross-modal Mixup via Optimal Transport for Speech Translation

Arxiv

0+阅读 · 2023年5月24日

PhotoMat: A Material Generator Learned from Single Flash Photos

Arxiv

0+阅读 · 2023年5月23日

Cross-Attention is Not Enough: Incongruity-Aware Multimodal Sentiment Analysis and Emotion Recognition

Arxiv

0+阅读 · 2023年5月23日

Feature Denoising for Improving Adversarial Robustness

Feature Denoising for Improving Adversarial Robustness

Arxiv

15+阅读 · 2018年12月9日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2022】Branchformer:并行MLP-Attention架构，捕捉局部和全局上下文，用于语音识别和理解

【ICML2022】Branchformer:并行MLP-Attention架构，捕捉局部和全局上下文，用于语音识别和理解

专知会员服务

25+阅读 · 2022年7月8日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

92+阅读 · 2020年2月12日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【斯坦福博士论文】计算受限的持续学习：基础与算法

生成式人工智能时代的多目标推荐：最新进展与未来展望综述

AI大模型技术在电力系统中的应用及发展趋势

【ICML2025】SparseLoRA：利用上下文稀疏性加速大语言模型微调

相关资讯

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

机器之心

0+阅读 · 2022年10月7日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

机器学习研究会

11+阅读 · 2018年1月14日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

相关论文

ProSpect: Expanded Conditioning for the Personalization of Attribute-aware Image Generation

Arxiv

0+阅读 · 2023年5月25日

Minimizing Trajectory Curvature of ODE-based Generative Models

Arxiv

0+阅读 · 2023年5月25日

Non-Asymptotic Lower Bounds For Training Data Reconstruction

Arxiv

0+阅读 · 2023年5月24日

Streaming Parrotron for on-device speech-to-speech conversion

Arxiv

0+阅读 · 2023年5月24日

A Neural Space-Time Representation for Text-to-Image Personalization

Arxiv

0+阅读 · 2023年5月24日

Another Dead End for Morphological Tags? Perturbed Inputs and Parsing

Arxiv

0+阅读 · 2023年5月24日

CMOT: Cross-modal Mixup via Optimal Transport for Speech Translation

Arxiv

0+阅读 · 2023年5月24日

PhotoMat: A Material Generator Learned from Single Flash Photos

Arxiv

0+阅读 · 2023年5月23日

Cross-Attention is Not Enough: Incongruity-Aware Multimodal Sentiment Analysis and Emotion Recognition

Arxiv

0+阅读 · 2023年5月23日

Feature Denoising for Improving Adversarial Robustness

Feature Denoising for Improving Adversarial Robustness

Arxiv

15+阅读 · 2018年12月9日

相关基金

使用GPU加速银道面尘埃辐射图像的高分辨率模拟与多参数反演

国家自然科学基金

0+阅读 · 2015年12月31日

以外泌体介导的microRNA体内传输机制为基础开发新型的骨肉瘤治疗方式

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Pictet–Spengler类反应机理的理论研究和新反应设计

国家自然科学基金

0+阅读 · 2013年12月31日

针对FPGA协处理器的高速布局布线算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

ESD术中用创面修复温敏凝胶的设计及其作用机理

国家自然科学基金

0+阅读 · 2012年12月31日

面向嵌入式系统的TLC NAND闪存存储系统优化技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

全球叶面积指数遥感产品在中国水稻区的不确定性评价与改进方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

Lorenz-like系统族的等价性和混沌吸引子几何结构

国家自然科学基金

0+阅读 · 2011年12月31日

Erbin在细胞分裂周期中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员