Recently, end-to-end ASR based either on sequence-to-sequence networks or on the CTC objective function gained a lot of interest from the community, achieving competitive results over traditional systems using robust but complex pipelines. One of the main features of end-to-end systems, in addition to the ability to free themselves from extra linguistic resources such as dictionaries or language models, is the capacity to model acoustic units such as characters, subwords or directly words; opening up the capacity to directly translate speech with different representations or levels of knowledge depending on the target language. In this paper we propose a review of the existing end-to-end ASR approaches for the French language. We compare results to conventional state-of-the-art ASR systems and discuss which units are more suited to model the French language.
翻译:最近,基于按顺序排列的网络或反恐委员会客观功能的终端到终端的ASR赢得了社区的极大兴趣,在使用强大但复杂的管道的传统系统上取得了竞争结果,终端到终端系统的主要特点之一,除了能够摆脱诸如词典或语言模式等额外的语言资源之外,还能够模拟诸如字符、子词或直接文字等音响单位;开启了以不同表述或知识水平根据目标语言直接翻译演讲的能力;在本文件中,我们提议审查现有的法语终端到终端的ASR方法;我们将结果与传统最先进的ASR系统进行比较,并讨论哪些单元更适合模拟法语。