The recently proposed conformer architecture has been successfully used for end-to-end automatic speech recognition (ASR) architectures achieving state-of-the-art performance on different datasets. To our best knowledge, the impact of using conformer acoustic model for hybrid ASR is not investigated. In this paper, we present and evaluate a competitive conformer-based hybrid model training recipe. We study different training aspects and methods to improve word-error-rate as well as to increase training speed. We apply time downsampling methods for efficient training and use transposed convolutions to upsample the output sequence again. We conduct experiments on Switchboard 300h dataset and our conformer-based hybrid model achieves competitive results compared to other architectures. It generalizes very well on Hub5'01 test set and outperforms the BLSTM-based hybrid model significantly.


翻译:最近提议的校正结构已成功地用于终端到终端自动语音识别(ASR)结构,在不同的数据集上达到最先进的性能。据我们所知,没有调查使用校正声学模型对混合的ASR的影响。在本文中,我们介绍和评价了一种竞争性校正型混合培训食谱。我们研究了不同的培训方面和方法,以改进单体率并加快培训速度。我们运用了时间下游抽样方法来进行高效培训,并使用转换的演进来更新产出序列。我们在交换机300小时数据集上进行了实验,我们的校正混合模型与其他结构相比取得了竞争性结果。它概括了HUB5'01测试集,大大优于基于BLSTM的混合模型。

0
下载
关闭预览

相关内容

TiramisuASR:用TensorFlow实现的语音识别引擎
强化学习的Unsupervised Meta-Learning
CreateAMind
18+阅读 · 2019年1月7日
Unsupervised Learning via Meta-Learning
CreateAMind
43+阅读 · 2019年1月3日
最佳实践:深度学习用于自然语言处理(三)
待字闺中
3+阅读 · 2017年8月20日
Neural Speech Synthesis with Transformer Network
Arxiv
5+阅读 · 2019年1月30日
VIP会员
相关资讯
Top
微信扫码咨询专知VIP会员