语音识别 (Automatic Speech Recognition, ASR)是指利用计算机实现从语音到文字自动转换的任务。在实际应用中,语音识别通常与自然语言理解、自然语言生成和语音合成等技术结合在一起,提供一个基于语音的自然流畅的人机交互方法。

VIP内容

课程名称: Artificial Intelligence: Principles and Techniques

课程简介:

网络搜索、语音识别、人脸识别、机器翻译、自动驾驶和自动调度有什么共同之处?这些都是复杂的现实世界问题,而人工智能(AI)的目标就是用严格的数学工具来解决这些问题。在本课程中,您将学习驱动这些应用程序的基本原则,并练习实现其中一些系统。具体的主题包括机器学习、搜索、马尔科夫决策过程、约束满足、图形模型和逻辑。这门课程的主要目标是让你具备解决生活中可能遇到的新人工智能问题的工具。

课程部分大纲:

  • 课程简介
  • 机器学习
    • 线性分类
    • 损失最小化
    • 随机梯度下降法
    • 章节:优化,概率,Python(综述)
    • 特性和非线性
    • 神经网络,最近邻
  • 搜索
  • 马尔可夫决策过程
    • 政策评估,政策改进
    • 策略迭代,值迭代
    • 强化学习
  • 博弈
  • 约束满足问题(Dorsa, Reid)
  • 贝叶斯网络
  • 逻辑
  • 结论

讲师介绍:

Percy Liang,斯坦福大学计算机科学与统计系副教授,他的研究方向是自然语言处理和统计机器学习。 个人网页:https://cs.stanford.edu/~pliang/

成为VIP会员查看完整内容
1+
0+
更多VIP内容

最新论文

Recently, Transformer has gained success in automatic speech recognition (ASR) field. However, it is challenging to deploy a Transformer-based end-to-end (E2E) model for online speech recognition. In this paper, we propose the Transformer-based online CTC/attention E2E ASR architecture, which contains the chunk self-attention encoder (chunk-SAE) and the monotonic truncated attention (MTA) based self-attention decoder (SAD). Firstly, the chunk-SAE splits the speech into isolated chunks. To reduce the computational cost and improve the performance, we propose the state reuse chunk-SAE. Sencondly, the MTA based SAD truncates the speech features monotonically and performs attention on the truncated features. To support the online recognition, we integrate the state reuse chunk-SAE and the MTA based SAD into online CTC/attention architecture. We evaluate the proposed online models on the HKUST Mandarin ASR benchmark and achieve a 23.66% character error rate (CER) with a 320 ms latency. Our online model yields as little as $0.19\%$ absolute CER degradation compared with the offline baseline, and achieves significant improvement over our prior work on Long Short-Term Memory (LSTM) based online E2E models.

0+
0+
下载
预览
更多最新论文
Top