语音识别是计算机科学和计算语言学的一个跨学科子领域,它发展了一些方法和技术,使计算机可以将口语识别和翻译成文本。 它也被称为自动语音识别(ASR),计算机语音识别或语音转文本(STT)。它整合了计算机科学,语言学和计算机工程领域的知识和研究。

VIP内容

语音识别是人机交互的入口,是指机器/程序接收、解释声音,或理解和执行口头命令的能力。在智能时代,越来越多的场景在设计个性化的交互界面时,采用以对话为主的交互形式。一个完整的对话交互是由“听懂——理解——回答”三个步骤完成的闭环,其中,“听懂”需要语音识别(Automatic Speech Recognition, ASR)技术;“理解”需要自然语言处理(Natural Language Processing, NLP)技术;“回答”需要语音合成(Text To Speech, TTS)技术。三个步骤环环相扣,相辅相成。语音识别技术是对话交互的开端,是保证对话交互高效准确进行的基础。

今天分享来自于沙利文的《中国AI语音识别市场研究报告》,报告基于对语音识别的理解,从技术领域、行业领域、市场参与者等多个维度对中国AI语音识别市场进行分析,研究中国AI语音识别市场发展的驱动因素,主要趋势,以及进入壁垒和成功关键因素,并对中国AI语音识别主流厂商进行企业增长能力分析,为中国AI语音识别提供商及AI语音识别使用者提供参考。

成为VIP会员查看完整内容
0
38

最新论文

To join the advantages of classical and end-to-end approaches for speech recognition, we present a simple, novel and competitive approach for phoneme-based neural transducer modeling. Different alignment label topologies are compared and word-end-based phoneme label augmentation is proposed to improve performance. Utilizing the local dependency of phonemes, we adopt a simplified neural network structure and a straightforward integration with the external word-level language model to preserve the consistency of seq-to-seq modeling. We also present a simple, stable and efficient training procedure using frame-wise cross-entropy loss. A phonetic context size of one is shown to be sufficient for the best performance. A simplified scheduled sampling approach is applied for further improvement and different decoding approaches are briefly compared. The overall performance of our best model is comparable to state-of-the-art (SOTA) results for the TED-LIUM Release 2 and Switchboard corpora.

0
0
下载
预览
Top