Recognition of Hungarian conversational telephone speech is challenging due to the informal style and morphological richness of the language. Recurrent Neural Network Language Model (RNNLM) can provide remedy for the high perplexity of the task; however, two-pass decoding introduces a considerable processing delay. In order to eliminate this delay we investigate approaches aiming at the complexity reduction of RNNLM, while preserving its accuracy. We compare the performance of conventional back-off n-gram language models (BNLM), BNLM approximation of RNNLMs (RNN-BNLM) and RNN n-grams in terms of perplexity and word error rate (WER). Morphological richness is often addressed by using statistically derived subwords - morphs - in the language models, hence our investigations are extended to morph-based models, as well. We found that using RNN-BNLMs 40% of the RNNLM perplexity reduction can be recovered, which is roughly equal to the performance of a RNN 4-gram model. Combining morph-based modeling and approximation of RNNLM, we were able to achieve 8% relative WER reduction and preserve real-time operation of our conversational telephone speech recognition system.
翻译:经常神经网络语言模型(RNNNLM)可以为任务的高度复杂和字数错误率提供补救;然而,两通解码过程拖延了相当长的时间。为了消除这一拖延,我们调查了旨在降低匈牙利对口电话语言语言语言的复杂程度的方法,同时保持其准确性。我们比较了传统后退n克语言模型(BNLM)、RNNLM(RNN-BNLM)和RNNnnn ngs的近似性能,这大致相当于RNNN 4模型的性能。 将基于变式模型和RNNNLM实际语音操作的近似性能合并起来,我们得以保持RNNLM实际语音系统8的缩小和保持对RNNLM实际语音系统8的识别。