De-STT: 使用反排斥的遗忘来消除在对文本系统讲话中的不想要的扰动和偏见 (De-STT: De-entaglement of unwanted Nuisances and Biases in Speech to Text System using Adversarial Forgetting) - 专知论文

会员服务 ·

0

INFORMS · 有偏 · 训练集 · MoDELS · 学成 ·

2021 年 6 月 5 日

De-STT: De-entaglement of unwanted Nuisances and Biases in Speech to Text System using Adversarial Forgetting

翻译：De-STT: 使用反排斥的遗忘来消除在对文本系统讲话中的不想要的扰动和偏见

Hemant Yadav,Janvijay Singh,Atul Anshuman Singh,Rachit Mittal,Rajiv Ratn Shah

from arxiv, Need to add substantial findings during the new experiments, which would mean to update the claim made in the paper

Training a robust Speech to Text (STT) system requires tens of thousands of hours of data. Variabilities present in the dataset such as unwanted nuisances (environmental noise, etc) and biases (accent, gender, age, etc) are reasons for the need of large datasets to learn general representations, which is often not feasible for low resource languages. In many computer vision tasks, a recently proposed adversarial forgetting approach to remove unwanted features has produced good results. This motivates us to study the effect of de-entangling the accent information from the input speech signal while training STT systems. To this end, we use an information bottleneck architecture based on adversarial forgetting. This training scheme aims to enforce the model to learn general accent invariant speech representations. Two STT models trained on just 20 hrs of audio, with and without adversarial forgetting, are tested on two unseen accents not present in the training set. The results favour the adversarial forgetting scheme with an absolute average improvement of 6\% over the standard training scheme. Furthermore, we also observe an absolute improvement of 5.5\% when tested on the seen accents present in the training set.

翻译：强力的文字演讲(STT)系统需要数万小时的数据。数据集中存在的不想要的干扰(环境噪音等)和偏见(强烈、性别、年龄等)等差异是需要大型数据集来学习一般表达方式的原因,而对于低资源语言来说,这些数据往往不可行。在许多计算机愿景任务中,最近提出的消除不想要的特征的对抗式忘却方法产生了良好的结果。这促使我们在培训STT系统时研究从输入演讲信号中脱钩口音信息的效果。为此,我们使用基于对抗性遗忘的信息瓶颈结构。本培训计划的目的是执行学习一般口头表达方式的模式。仅用20小时的音频、有对抗性遗忘或不对抗性遗忘方式培训的两个STT模型在培训组合中没有出现的两种看不见的口音上进行了测试。结果有利于对抗性遗忘计划,比标准培训计划得到绝对平均6 ⁇ 的改进。此外,我们还观察到,在对培训组合中显示的口音进行测试时,5.5 ⁇ 的绝对改进。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

专知

14+阅读 · 2018年2月4日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Arxiv

7+阅读 · 2021年6月11日

VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs

Arxiv

3+阅读 · 2021年1月29日

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Arxiv

7+阅读 · 2019年10月8日

Interpretable Adversarial Training for Text

Interpretable Adversarial Training for Text

Arxiv

5+阅读 · 2019年5月30日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

End-to-End Speech Recognition From the Raw Waveform

Arxiv

3+阅读 · 2018年6月19日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

Correlated discrete data generation using adversarial training

Arxiv

5+阅读 · 2018年4月3日

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

Arxiv

18+阅读 · 2018年1月5日

Denoising Adversarial Autoencoders

Arxiv

9+阅读 · 2018年1月4日

VIP会员

文章信息

相关主题

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

专知

14+阅读 · 2018年2月4日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Arxiv

7+阅读 · 2021年6月11日

VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs

Arxiv

3+阅读 · 2021年1月29日

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Arxiv

7+阅读 · 2019年10月8日

Interpretable Adversarial Training for Text

Interpretable Adversarial Training for Text

Arxiv

5+阅读 · 2019年5月30日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

End-to-End Speech Recognition From the Raw Waveform

Arxiv

3+阅读 · 2018年6月19日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

Correlated discrete data generation using adversarial training

Arxiv

5+阅读 · 2018年4月3日

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

Arxiv

18+阅读 · 2018年1月5日

Denoising Adversarial Autoencoders

Arxiv

9+阅读 · 2018年1月4日

微信扫码咨询专知VIP会员