上至下议长作为处理后进行拨号 (End-to-End Speaker Diarization as Post-Processing) - 专知论文

会员服务 ·

0

端到端 · Performer · state-of-the-art · 簇 · MoDELS ·

2020 年 12 月 23 日

End-to-End Speaker Diarization as Post-Processing

翻译：上至下议长作为处理后进行拨号

Shota Horiguchi,Paola Garcia,Yusuke Fujita,Shinji Watanabe,Kenji Nagamatsu

This paper investigates the utilization of an end-to-end diarization model as post-processing of conventional clustering-based diarization. Clustering-based diarization methods partition frames into clusters of the number of speakers; thus, they typically cannot handle overlapping speech because each frame is assigned to one speaker. On the other hand, some end-to-end diarization methods can handle overlapping speech by treating the problem as multi-label classification. Although some methods can treat a flexible number of speakers, they do not perform well when the number of speakers is large. To compensate for each other's weakness, we propose to use a two-speaker end-to-end diarization method as post-processing of the results obtained by a clustering-based method. We iteratively select two speakers from the results and update the results of the two speakers to improve the overlapped region. Experimental results show that the proposed algorithm consistently improved the performance of the state-of-the-art methods across CALLHOME, AMI, and DIHARD II datasets.

翻译：本文调查了将端到端的二分化模式用作传统集束化二分化的后处理方法的利用情况。基于集束化的二分化方法分割框架分组成发言者人数的组群;因此,通常无法处理重叠的演讲,因为每个框架分配给一名发言者。另一方面,一些端到端的二分化方法可以将问题作为多标签分类处理,从而处理重叠的演讲。虽然有些方法可以处理一些灵活的发言者,但在发言者人数众多时,它们的表现并不很好。为了弥补彼此的弱点,我们提议使用两声端到端的二分化方法作为集法处理后处理结果的方法。我们反复从结果中挑选两名发言者,并更新两位发言者的结果,以改进重叠的区域。实验结果显示,拟议的算法始终在改善全AWHOME、AMI和DIHARD II数据集的状态方法的性能。

0

相关内容

端到端

ICLR2021放榜了！ 687篇入选34篇得满分！ 48篇orals，108篇spotlights，531篇poster

ICLR2021放榜了！ 687篇入选34篇得满分！ 48篇orals，108篇spotlights，531篇poster

专知会员服务

23+阅读 · 2021年1月13日

区块链白皮书（2020年），60页pdf

区块链白皮书（2020年），60页pdf

专知会员服务

91+阅读 · 2021年1月5日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

122+阅读 · 2020年11月20日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

157+阅读 · 2020年6月2日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

18+阅读 · 2020年2月26日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

25+阅读 · 2020年2月16日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

86+阅读 · 2020年2月12日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

41+阅读 · 2019年11月12日

经典书《机器学习：概率视角》（Machine Learning: a Probabilistic Perspective）第二版Python代码，附1098页pdf下载

经典书《机器学习：概率视角》（Machine Learning: a Probabilistic Perspective）第二版Python代码，附1098页pdf下载

专知会员服务

253+阅读 · 2019年10月25日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

文本情感分析的预处理

文本情感分析的预处理

Datartisan数据工匠

17+阅读 · 2018年3月8日

【论文推荐】最新6篇目标检测（Object Detection）相关论文—物体链接、手机端、三维地图、航空图像、检测与姿态估计

【论文推荐】最新6篇目标检测（Object Detection）相关论文—物体链接、手机端、三维地图、航空图像、检测与姿态估计

专知

8+阅读 · 2018年2月5日

人工智能 | 国际会议/SCI期刊约稿信息9条

人工智能 | 国际会议/SCI期刊约稿信息9条

Call4Papers

3+阅读 · 2018年1月12日

论文报告 | Graph-based Neural Multi-Document Summarization

论文报告 | Graph-based Neural Multi-Document Summarization

科技创新与创业

15+阅读 · 2017年12月15日

NLP中自动生产文摘（auto text summarization）

NLP中自动生产文摘（auto text summarization）

机器学习研究会

14+阅读 · 2017年10月10日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Arxiv

3+阅读 · 2020年6月9日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

Meta Learning for Task-Driven Video Summarization

Arxiv

6+阅读 · 2019年7月29日

Fine-grained robust prosody transfer for single-speaker neural text-to-speech

Fine-grained robust prosody transfer for single-speaker neural text-to-speech

Arxiv

5+阅读 · 2019年7月4日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Neural source-filter-based waveform model for statistical parametric speech synthesis

Arxiv

4+阅读 · 2018年11月26日

Are Generative Classifiers More Robust to Adversarial Attacks?

Are Generative Classifiers More Robust to Adversarial Attacks?

Arxiv

4+阅读 · 2018年7月9日

End-to-End Speech Recognition From the Raw Waveform

Arxiv

3+阅读 · 2018年6月19日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

ICLR2021放榜了！ 687篇入选34篇得满分！ 48篇orals，108篇spotlights，531篇poster

ICLR2021放榜了！ 687篇入选34篇得满分！ 48篇orals，108篇spotlights，531篇poster

专知会员服务

23+阅读 · 2021年1月13日

区块链白皮书（2020年），60页pdf

区块链白皮书（2020年），60页pdf

专知会员服务

91+阅读 · 2021年1月5日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

122+阅读 · 2020年11月20日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

157+阅读 · 2020年6月2日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

18+阅读 · 2020年2月26日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

25+阅读 · 2020年2月16日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

86+阅读 · 2020年2月12日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

41+阅读 · 2019年11月12日

经典书《机器学习：概率视角》（Machine Learning: a Probabilistic Perspective）第二版Python代码，附1098页pdf下载

经典书《机器学习：概率视角》（Machine Learning: a Probabilistic Perspective）第二版Python代码，附1098页pdf下载

专知会员服务

253+阅读 · 2019年10月25日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

热门VIP内容

相关资讯

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

文本情感分析的预处理

文本情感分析的预处理

Datartisan数据工匠

17+阅读 · 2018年3月8日

【论文推荐】最新6篇目标检测（Object Detection）相关论文—物体链接、手机端、三维地图、航空图像、检测与姿态估计

【论文推荐】最新6篇目标检测（Object Detection）相关论文—物体链接、手机端、三维地图、航空图像、检测与姿态估计

专知

8+阅读 · 2018年2月5日

人工智能 | 国际会议/SCI期刊约稿信息9条

人工智能 | 国际会议/SCI期刊约稿信息9条

Call4Papers

3+阅读 · 2018年1月12日

论文报告 | Graph-based Neural Multi-Document Summarization

论文报告 | Graph-based Neural Multi-Document Summarization

科技创新与创业

15+阅读 · 2017年12月15日

NLP中自动生产文摘（auto text summarization）

NLP中自动生产文摘（auto text summarization）

机器学习研究会

14+阅读 · 2017年10月10日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Arxiv

3+阅读 · 2020年6月9日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

Meta Learning for Task-Driven Video Summarization

Arxiv

6+阅读 · 2019年7月29日

Fine-grained robust prosody transfer for single-speaker neural text-to-speech

Fine-grained robust prosody transfer for single-speaker neural text-to-speech

Arxiv

5+阅读 · 2019年7月4日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Neural source-filter-based waveform model for statistical parametric speech synthesis

Arxiv

4+阅读 · 2018年11月26日

Are Generative Classifiers More Robust to Adversarial Attacks?

Are Generative Classifiers More Robust to Adversarial Attacks?

Arxiv

4+阅读 · 2018年7月9日

End-to-End Speech Recognition From the Raw Waveform

Arxiv

3+阅读 · 2018年6月19日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

微信扫码咨询专知VIP会员