低资源设置设置中低资源设置中分解语音单位的未监督的单词分隔区分隔 (Unsupervised Word Segmentation from Discrete Speech Units in Low-Resource Settings) - 专知论文

会员服务 ·

0

离散化 · MoDELS · 无监督 · 情景 · Performer ·

2021 年 6 月 8 日

Unsupervised Word Segmentation from Discrete Speech Units in Low-Resource Settings

翻译：低资源设置设置中低资源设置中分解语音单位的未监督的单词分隔区分隔

Marcely Zanon Boito,Bolaji Yusuf,Lucas Ondel,Aline Villavicencio,Laurent Besacier

When documenting oral-languages, Unsupervised Word Segmentation (UWS) from speech is a useful, yet challenging, task. It can be performed from phonetic transcriptions, or in the absence of these, from the output of unsupervised speech discretization models. These discretization models are trained using raw speech only, producing discrete speech units which can be applied for downstream (text-based) tasks. In this paper we compare five of these models: three Bayesian and two neural approaches, with regards to the exploitability of the produced units for UWS. Two UWS models are experimented with and we report results for Finnish, Hungarian, Mboshi, Romanian and Russian in a low-resource setting (using only 5k sentences). Our results suggest that neural models for speech discretization are difficult to exploit in our setting, and that it might be necessary to adapt them to limit sequence length. We obtain our best UWS results by using the SHMM and H-SHMM Bayesian models, which produce high quality, yet compressed, discrete representations of the input speech signal.

翻译：当记录口头语言时,从语言上记录不受监督的单词分割(UWS)是一项有用但又具有挑战性的任务,可以通过语音抄录完成,或者在没有这些抄录的情况下,通过未经监督的单词分解模型的输出完成。这些分解模型仅使用原始语言进行培训,产生可应用于下游(基于文本)任务的单独语音单元。在本文中,我们比较了其中五个模型:三种巴伊西亚和两种神经方法,关于所生产的UWS单元的可开发性。两个UWS模型正在试验,并在低资源环境下报告芬兰、匈牙利、姆贝希、罗马尼亚和俄罗斯语的成绩(仅使用5k句)。我们的结果表明,在我们的环境下,单词分解神经模型很难被利用,而且可能有必要将其调整到限定的序列长度。我们通过使用SHMM和H-SHMM Bayesian模型获得我们最好的UWS结果,这些模型产生高质量的、但压缩的、分解的输入语音信号。

0

相关内容

离散化

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

18+阅读 · 2020年11月17日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

从多个自我监督任务中学习问题无关的语音表示，Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

从多个自我监督任务中学习问题无关的语音表示，Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

专知会员服务

17+阅读 · 2020年5月6日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【CVPR2020-Oral】无监督域内自适应语义分割，Unsupervised Intra-domain Adaptation

【CVPR2020-Oral】无监督域内自适应语义分割，Unsupervised Intra-domain Adaptation

专知会员服务

71+阅读 · 2020年4月20日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

已删除

将门创投

7+阅读 · 2019年10月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution

Arxiv

0+阅读 · 2021年8月2日

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing

Arxiv

0+阅读 · 2021年8月2日

Unsupervised and Unregistered Hyperspectral Image Super-Resolution with Mutual Dirichlet-Net

Arxiv

0+阅读 · 2021年8月2日

Adversarial Data Augmentation for Disordered Speech Recognition

Arxiv

0+阅读 · 2021年8月2日

Unsupervised Domain Adaptation for Hate Speech Detection Using a Data Augmentation Approach

Arxiv

0+阅读 · 2021年7月31日

A Simple and Effective Self-Supervised Contrastive Learning Framework for Aspect Detection

Arxiv

9+阅读 · 2020年12月31日

Investigating Meta-Learning Algorithms for Low-Resource Natural Language Understanding Tasks

Arxiv

5+阅读 · 2019年8月27日

Unsupervised Multilingual Word Embeddings

Arxiv

4+阅读 · 2018年9月6日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Weakly Supervised Instance Segmentation using Class Peak Response

Arxiv

3+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

相关VIP内容

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

18+阅读 · 2020年11月17日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

从多个自我监督任务中学习问题无关的语音表示，Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

从多个自我监督任务中学习问题无关的语音表示，Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

专知会员服务

17+阅读 · 2020年5月6日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【CVPR2020-Oral】无监督域内自适应语义分割，Unsupervised Intra-domain Adaptation

【CVPR2020-Oral】无监督域内自适应语义分割，Unsupervised Intra-domain Adaptation

专知会员服务

71+阅读 · 2020年4月20日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《海底至太空全域传感器网络协同调度》报告

《英国2025年国家安全战略》最新发布

《欧洲防务：建立区域防御盾牌》最新43页报告

《基于深度学习预测模拟场景中的飞行器与导弹轨迹》2025最新73页

相关资讯

已删除

将门创投

7+阅读 · 2019年10月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution

Arxiv

0+阅读 · 2021年8月2日

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing

Arxiv

0+阅读 · 2021年8月2日

Unsupervised and Unregistered Hyperspectral Image Super-Resolution with Mutual Dirichlet-Net

Arxiv

0+阅读 · 2021年8月2日

Adversarial Data Augmentation for Disordered Speech Recognition

Arxiv

0+阅读 · 2021年8月2日

Unsupervised Domain Adaptation for Hate Speech Detection Using a Data Augmentation Approach

Arxiv

0+阅读 · 2021年7月31日

A Simple and Effective Self-Supervised Contrastive Learning Framework for Aspect Detection

Arxiv

9+阅读 · 2020年12月31日

Investigating Meta-Learning Algorithms for Low-Resource Natural Language Understanding Tasks

Arxiv

5+阅读 · 2019年8月27日

Unsupervised Multilingual Word Embeddings

Arxiv

4+阅读 · 2018年9月6日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Weakly Supervised Instance Segmentation using Class Peak Response

Arxiv

3+阅读 · 2018年4月3日

微信扫码咨询专知VIP会员