通过自残注意力进行音频合成的标记式MRI序列制导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导 (Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous Translator) - 专知论文

会员服务 ·

0

Attention · 可理解性 · Guidance · 成对型 · 相关系数 ·

2022 年 9 月 25 日

Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous Translator

翻译：通过自残注意力进行音频合成的标记式MRI序列制导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导导

Xiaofeng Liu,Fangxu Xing,Jerry L. Prince,Jiachen Zhuo,Maureen Stone,Georges El Fakhri,Jonghye Woo

from arxiv, MICCAI 2022 (early accept, Oral Presentation ~3%)

Understanding the underlying relationship between tongue and oropharyngeal muscle deformation seen in tagged-MRI and intelligible speech plays an important role in advancing speech motor control theories and treatment of speech related-disorders. Because of their heterogeneous representations, however, direct mapping between the two modalities -- i.e., two-dimensional (mid-sagittal slice) plus time tagged-MRI sequence and its corresponding one-dimensional waveform -- is not straightforward. Instead, we resort to two-dimensional spectrograms as an intermediate representation, which contains both pitch and resonance, from which to develop an end-to-end deep learning framework to translate from a sequence of tagged-MRI to its corresponding audio waveform with limited dataset size.~Our framework is based on a novel fully convolutional asymmetry translator with guidance of a self residual attention strategy to specifically exploit the moving muscular structures during speech.~In addition, we leverage a pairwise correlation of the samples with the same utterances with a latent space representation disentanglement strategy.~Furthermore, we incorporate an adversarial training approach with generative adversarial networks to offer improved realism on our generated spectrograms.~Our experimental results, carried out with a total of 63 tagged-MRI sequences alongside speech acoustics, showed that our framework enabled the generation of clear audio waveforms from a sequence of tagged-MRI, surpassing competing methods. Thus, our framework provides the great potential to help better understand the relationship between the two modalities.

翻译：理解在标记的磁共振和感知性演讲中看到的舌头和眼部肌肉变形之间的根本关系。但是,由于两种模式(即二维(中成片)加上时间标记的磁共振序列及其相应的一维波形)之间的直接映射,并非直截了当。相反,我们采用两维光谱图作为中间代表,其中包括投影和共振,从而在推进语音发动机控制理论和处理与语音有关的病症方面发挥着重要作用。然而,由于这两种模式(即二维(中成片)加上时间标记的磁共振动肌肉变形序列)与相应的一维波形波形序列之间的直接映射图,我们采用两维对立式的深层次学习框架,从标记-磁共振的序列转换成相应的声波变形模型。我们的框架基于一种新型的对立式辩论式训练方法, 与更清晰的磁共振动式的图像框架相比, 提供了更清晰的磁共振动模型, 展示了更清晰的图像结构框架。

0

相关内容

Attention

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

41+阅读 · 2022年6月30日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

59+阅读 · 2022年4月22日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

25+阅读 · 2021年4月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

76+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

专知会员服务

27+阅读 · 2019年11月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

54+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

30+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

169+阅读 · 2019年10月11日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

Serglycin调控TGF-β信号通路诱导EMT促进膀胱癌转移机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

高血压患者Corin基因变异对其蛋白结构及酶功能影响的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于TGF-β1／Smads信号通路研究艾灸对RA模型大鼠心功能的影响及其干预的代谢组学研究

国家自然科学基金

0+阅读 · 2013年12月31日

密集小基站系统中的新型接入理论与技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Caveolae和Rho激酶信号传导通路在PPARs调节血管内皮细胞中缝隙连接蛋白的作用

国家自然科学基金

0+阅读 · 2012年12月31日

脂联素拮抗瘦素致血管细胞外基质（ECM）重构的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

地鼠克隆胚胎Kcnq1ot1 印记基因的DNA甲基化研究

国家自然科学基金

0+阅读 · 2009年12月31日

针灸治疗大鼠CD肠纤维化Smads与ERK-1/2MAPK信号通路Cross talk研究

国家自然科学基金

0+阅读 · 2009年12月31日

肾上腺源性及原发性高血压线粒体tRNAIle、tRNALeu(UUR)和tRNAlys基因突变的差异对比研究

国家自然科学基金

0+阅读 · 2009年12月31日

DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP

Arxiv

0+阅读 · 2022年11月2日

PLATO-K: Internal and External Knowledge Enhanced Dialogue Generation

Arxiv

0+阅读 · 2022年11月2日

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Arxiv

0+阅读 · 2022年11月1日

Speech-text based multi-modal training with bidirectional attention for improved speech recognition

Arxiv

0+阅读 · 2022年11月1日

The Importance of Accurate Alignments in End-to-End Speech Synthesis

Arxiv

0+阅读 · 2022年10月31日

Relating Human Perception of Musicality to Prediction in a Predictive Coding Model

Arxiv

0+阅读 · 2022年10月29日

Video-based Remote Physiological Measurement via Self-supervised Learning

Arxiv

0+阅读 · 2022年10月28日

Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis

Arxiv

0+阅读 · 2022年10月28日

Evaluating context-invariance in unsupervised speech representations

Arxiv

0+阅读 · 2022年10月27日

Attention Bottlenecks for Multimodal Fusion

Arxiv

30+阅读 · 2021年6月30日

VIP会员

文章信息

相关主题

相关VIP内容

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

41+阅读 · 2022年6月30日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

59+阅读 · 2022年4月22日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

25+阅读 · 2021年4月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

76+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

专知会员服务

27+阅读 · 2019年11月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

54+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

30+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

169+阅读 · 2019年10月11日

热门VIP内容

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

相关论文

DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP

Arxiv

0+阅读 · 2022年11月2日

PLATO-K: Internal and External Knowledge Enhanced Dialogue Generation

Arxiv

0+阅读 · 2022年11月2日

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Arxiv

0+阅读 · 2022年11月1日

Speech-text based multi-modal training with bidirectional attention for improved speech recognition

Arxiv

0+阅读 · 2022年11月1日

The Importance of Accurate Alignments in End-to-End Speech Synthesis

Arxiv

0+阅读 · 2022年10月31日

Relating Human Perception of Musicality to Prediction in a Predictive Coding Model

Arxiv

0+阅读 · 2022年10月29日

Video-based Remote Physiological Measurement via Self-supervised Learning

Arxiv

0+阅读 · 2022年10月28日

Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis

Arxiv

0+阅读 · 2022年10月28日

Evaluating context-invariance in unsupervised speech representations

Arxiv

0+阅读 · 2022年10月27日

Attention Bottlenecks for Multimodal Fusion

Arxiv

30+阅读 · 2021年6月30日

相关基金

Serglycin调控TGF-β信号通路诱导EMT促进膀胱癌转移机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

高血压患者Corin基因变异对其蛋白结构及酶功能影响的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于TGF-β1／Smads信号通路研究艾灸对RA模型大鼠心功能的影响及其干预的代谢组学研究

国家自然科学基金

0+阅读 · 2013年12月31日

密集小基站系统中的新型接入理论与技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Caveolae和Rho激酶信号传导通路在PPARs调节血管内皮细胞中缝隙连接蛋白的作用

国家自然科学基金

0+阅读 · 2012年12月31日

脂联素拮抗瘦素致血管细胞外基质（ECM）重构的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

地鼠克隆胚胎Kcnq1ot1 印记基因的DNA甲基化研究

国家自然科学基金

0+阅读 · 2009年12月31日

针灸治疗大鼠CD肠纤维化Smads与ERK-1/2MAPK信号通路Cross talk研究

国家自然科学基金

0+阅读 · 2009年12月31日

肾上腺源性及原发性高血压线粒体tRNAIle、tRNALeu(UUR)和tRNAlys基因突变的差异对比研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员