Automatic Speech Recognition (ASR) is the interdisciplinary subfield of computational linguistics that develops methodologies and technologies that enables the recognition and translation of spoken language into text by computers. It incorporates knowledge and research in linguistics, computer science, and electrical engineering fields. Sentiment analysis is contextual mining of text which identifies and extracts subjective information in the source material and helping a business to understand the social sentiment of their brand, product or service while monitoring online conversations. According to the speech structure, three models are used in speech recognition to do the match: Acoustic Model, Phonetic Dictionary and Language Model. Any speech recognition program is evaluated using two factors: Accuracy (percentage error in converting spoken words to digital data) and Speed (the extent to which the program can keep up with a human speaker). For the purpose of converting speech to text (STT), we will be studying the following open source toolkits: CMU Sphinx and Kaldi. The toolkits use Mel-Frequency Cepstral Coefficients (MFCC) and I-vector for feature extraction. CMU Sphinx has been used with pre-trained Hidden Markov Models (HMM) and Gaussian Mixture Models (GMM), while Kaldi is used with pre-trained Neural Networks (NNET) as acoustic models. The n-gram language models contain the phonemes or pdf-ids for generating the most probable hypothesis (transcription) in the form of a lattice. The speech dataset is stored in the form of .raw or .wav file and is transcribed in .txt file. The system then tries to identify opinions within the text, and extract the following attributes: Polarity (if the speaker expresses a positive or negative opinion) and Keywords (the thing that is being talked about).
翻译:自动语音识别( ASR) 是计算语言学的跨学科子领域, 它开发了方法和技术, 使计算机能够将口语识别和翻译为文字。 它包含语言学、 计算机科学和电气工程领域的知识和研究。 感应分析是针对文字进行背景挖掘的文本, 它识别和提取源材料中的主观信息, 帮助企业在监测在线对话时了解其品牌、 产品或服务的社交情绪。 根据语音结构, 在语音识别中使用了三种模型来进行匹配 : 声频模型、 音频词和语言模型。 任何语音识别程序都使用两个因素来评估: 读性( 将口语转换为数字数据中的百分数错误错误) 和速度( 程序可以与人说话者保持同步) 。 为了将语音转换为文本的目的, 我们将研究以下开放源工具包: CMUS Sphinx 和 Kaldi。 工具使用MML 的模型和 Riocialal 版本( 正在生成MMM 或 Galex 的版本) 和 NMIS 版本。