通过静电和双向电话暂停决策级融合,改进语音传输中的言语识别 (Improving Word Recognition in Speech Transcriptions by Decision-level Fusion of Stemming and Two-way Phoneme Pruning) - 专知论文

会员服务 ·

0

音素 · 剪枝 · 模型评估 · Google API · MP4 ·

2021 年 7 月 26 日

Improving Word Recognition in Speech Transcriptions by Decision-level Fusion of Stemming and Two-way Phoneme Pruning

翻译：通过静电和双向电话暂停决策级融合,改进语音传输中的言语识别

Sunakshi Mehra,Seba Susan

from arxiv, Accepted in International Advanced Computing Conference (2020)

We introduce an unsupervised approach for correcting highly imperfect speech transcriptions based on a decision-level fusion of stemming and two-way phoneme pruning. Transcripts are acquired from videos by extracting audio using Ffmpeg framework and further converting audio to text transcript using Google API. In the benchmark LRW dataset, there are 500 word categories, and 50 videos per class in mp4 format. All videos consist of 29 frames (each 1.16 s long) and the word appears in the middle of the video. In our approach we tried to improve the baseline accuracy from 9.34% by using stemming, phoneme extraction, filtering and pruning. After applying the stemming algorithm to the text transcript and evaluating the results, we achieved 23.34% accuracy in word recognition. To convert words to phonemes we used the Carnegie Mellon University (CMU) pronouncing dictionary that provides a phonetic mapping of English words to their pronunciations. A two-way phoneme pruning is proposed that comprises of the two non-sequential steps: 1) filtering and pruning the phonemes containing vowels and plosives 2) filtering and pruning the phonemes containing vowels and fricatives. After obtaining results of stemming and two-way phoneme pruning, we applied decision-level fusion and that led to an improvement of word recognition rate upto 32.96%.

翻译：我们引入了一种未经监督的方法来纠正高度不完善的语音笔录, 其依据是: 以决定级别混合制制制和双向电话线调制, 纠正高度不完善的语音笔录。通过使用 Ffmpeg 框架提取音频, 并使用 Google API 进一步将音频转换为文本誊本, 从视频中获取了笔记本。在基准 LRW 数据集中, 每类有500个字类别, 每类有 mp4 格式的50 个视频。所有视频都包含 29 个框架( 每个1. 16 s long), 并在视频中间出现单词。在我们的方法中, 我们试图通过使用冲压、电话提取、过滤、过滤和剪裁和剪裁等两种非序列步骤来提高9. 34 的基线精确度。在对文本抄录记录和评估结果进行评估后, 我们实现了23.34% 的准确度。要将文字转换成通音频段, 我们用双端的音阶级调整后, 获取包含誓言和感应和感升级和感升级结果。

0

相关内容

【干货书】计算成像，483页pdf，Computational Imaging Book, MIT 出版社

专知会员服务

65+阅读 · 2021年9月12日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

还在修改博士论文？这份《博士论文写作技巧》为你指南

还在修改博士论文？这份《博士论文写作技巧》为你指南

专知会员服务

165+阅读 · 2020年6月9日

【2020关键词提取】基于深度神经网络的关键词提取，Keywords extraction with deep neural network model

【2020关键词提取】基于深度神经网络的关键词提取，Keywords extraction with deep neural network model

专知会员服务

60+阅读 · 2020年5月2日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

TiramisuASR：用TensorFlow实现的语音识别引擎

TiramisuASR：用TensorFlow实现的语音识别引擎

专知

3+阅读 · 2020年8月1日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

免费自然语言处理(NLP)课程及教材分享

免费自然语言处理(NLP)课程及教材分享

深度学习与NLP

29+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

专知

15+阅读 · 2018年5月1日

推荐｜清华老师推荐30来项算法代码和工具包列表（开源）

推荐｜清华老师推荐30来项算法代码和工具包列表（开源）

全球人工智能

26+阅读 · 2018年3月26日

【专知荟萃19】图像识别Image Recognition知识资料全集（入门/进阶/论文/综述/视频/专家，附查看）

【专知荟萃19】图像识别Image Recognition知识资料全集（入门/进阶/论文/综述/视频/专家，附查看）

专知

20+阅读 · 2017年11月18日

【推荐】视频目标分割基础

【推荐】视频目标分割基础

机器学习研究会

9+阅读 · 2017年9月19日

Emotional Speech Synthesis for Companion Robot to Imitate Professional Caregiver Speech

Arxiv

0+阅读 · 2021年9月27日

Leveraging Pretrained Models for Automatic Summarization of Doctor-Patient Conversations

Arxiv

1+阅读 · 2021年9月24日

Simple and Effective Zero-shot Cross-lingual Phoneme Recognition

Arxiv

0+阅读 · 2021年9月23日

Speech2Action: Cross-modal Supervision for Action Recognition

Speech2Action: Cross-modal Supervision for Action Recognition

Arxiv

7+阅读 · 2020年3月30日

Visual Grounding in Video for Unsupervised Word Translation

Visual Grounding in Video for Unsupervised Word Translation

Arxiv

7+阅读 · 2020年3月11日

Exploring RNN-Transducer for Chinese Speech Recognition

Arxiv

4+阅读 · 2019年4月23日

End-to-End Speech Recognition From the Raw Waveform

Arxiv

3+阅读 · 2018年6月19日

A Robust Real-Time Automatic License Plate Recognition based on the YOLO Detector

Arxiv

13+阅读 · 2018年3月1日

Word Translation Without Parallel Data

Arxiv

7+阅读 · 2018年1月30日

Translating Pro-Drop Languages with Reconstruction Models

Arxiv

3+阅读 · 2018年1月10日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】计算成像，483页pdf，Computational Imaging Book, MIT 出版社

专知会员服务

65+阅读 · 2021年9月12日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

还在修改博士论文？这份《博士论文写作技巧》为你指南

还在修改博士论文？这份《博士论文写作技巧》为你指南

专知会员服务

165+阅读 · 2020年6月9日

【2020关键词提取】基于深度神经网络的关键词提取，Keywords extraction with deep neural network model

【2020关键词提取】基于深度神经网络的关键词提取，Keywords extraction with deep neural network model

专知会员服务

60+阅读 · 2020年5月2日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

AI Agent、传统聊天机器人有何区别？如何评测？这篇30页综述讲明白了

【普林斯顿博士论文】迈向原则化的强化学习

基于多模态大模型的具身智能体研究进展与展望

CVPR2025 | ODE：多模态大语言模型幻觉的开集动态评估框架

相关资讯

TiramisuASR：用TensorFlow实现的语音识别引擎

TiramisuASR：用TensorFlow实现的语音识别引擎

专知

3+阅读 · 2020年8月1日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

免费自然语言处理(NLP)课程及教材分享

免费自然语言处理(NLP)课程及教材分享

深度学习与NLP

29+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

专知

15+阅读 · 2018年5月1日

推荐｜清华老师推荐30来项算法代码和工具包列表（开源）

推荐｜清华老师推荐30来项算法代码和工具包列表（开源）

全球人工智能

26+阅读 · 2018年3月26日

【专知荟萃19】图像识别Image Recognition知识资料全集（入门/进阶/论文/综述/视频/专家，附查看）

【专知荟萃19】图像识别Image Recognition知识资料全集（入门/进阶/论文/综述/视频/专家，附查看）

专知

20+阅读 · 2017年11月18日

【推荐】视频目标分割基础

【推荐】视频目标分割基础

机器学习研究会

9+阅读 · 2017年9月19日

相关论文

Emotional Speech Synthesis for Companion Robot to Imitate Professional Caregiver Speech

Arxiv

0+阅读 · 2021年9月27日

Leveraging Pretrained Models for Automatic Summarization of Doctor-Patient Conversations

Arxiv

1+阅读 · 2021年9月24日

Simple and Effective Zero-shot Cross-lingual Phoneme Recognition

Arxiv

0+阅读 · 2021年9月23日

Speech2Action: Cross-modal Supervision for Action Recognition

Speech2Action: Cross-modal Supervision for Action Recognition

Arxiv

7+阅读 · 2020年3月30日

Visual Grounding in Video for Unsupervised Word Translation

Visual Grounding in Video for Unsupervised Word Translation

Arxiv

7+阅读 · 2020年3月11日

Exploring RNN-Transducer for Chinese Speech Recognition

Arxiv

4+阅读 · 2019年4月23日

End-to-End Speech Recognition From the Raw Waveform

Arxiv

3+阅读 · 2018年6月19日

A Robust Real-Time Automatic License Plate Recognition based on the YOLO Detector

Arxiv

13+阅读 · 2018年3月1日

Word Translation Without Parallel Data

Arxiv

7+阅读 · 2018年1月30日

Translating Pro-Drop Languages with Reconstruction Models

Arxiv

3+阅读 · 2018年1月10日

微信扫码咨询专知VIP会员