MRSAudio：一个具有精细化标注的大规模多模态录制空间音频数据集 (MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations)

Wenxiang Guo,Changhao Pan,Zhiyuan Zhu,Xintong Hu,Yu Zhang,Li Tang,Rui Yang,Han Wang,Zongbao Zhang,Yuhan Wang,Yixuan Chen,Hankun Xu,Ke Xu,Pengfei Fan,Zhetao Chen,Yanhao Yu,Qiange Huang,Fei Wu,Zhou Zhao

from arxiv, 24 pages

Humans rely on multisensory integration to perceive spatial environments, where auditory cues enable sound source localization in three-dimensional space. Despite the critical role of spatial audio in immersive technologies such as VR/AR, most existing multimodal datasets provide only monaural audio, which limits the development of spatial audio generation and understanding. To address these challenges, we introduce MRSAudio, a large-scale multimodal spatial audio dataset designed to advance research in spatial audio understanding and generation. MRSAudio spans four distinct components: MRSLife, MRSSpeech, MRSMusic, and MRSSing, covering diverse real-world scenarios. The dataset includes synchronized binaural and ambisonic audio, exocentric and egocentric video, motion trajectories, and fine-grained annotations such as transcripts, phoneme boundaries, lyrics, scores, and prompts. To demonstrate the utility and versatility of MRSAudio, we establish five foundational tasks: audio spatialization, and spatial text to speech, spatial singing voice synthesis, spatial music generation and sound event localization and detection. Results show that MRSAudio enables high-quality spatial modeling and supports a broad range of spatial audio research. Demos and dataset access are available at https://mrsaudio.github.io.

翻译：人类依赖多感官整合来感知空间环境，其中听觉线索使得声源能够在三维空间中被定位。尽管空间音频在VR/AR等沉浸式技术中扮演着关键角色，但现有的大多数多模态数据集仅提供单声道音频，这限制了空间音频生成与理解研究的发展。为应对这些挑战，我们推出了MRSAudio，一个旨在推动空间音频理解与生成研究的大规模多模态空间音频数据集。MRSAudio包含四个独立组成部分：MRSLife、MRSSpeech、MRSMusic与MRSSing，涵盖了多样化的真实世界场景。该数据集包含同步的双耳音频与高阶环绕声音频、外中心与内中心视角视频、运动轨迹，以及精细化的标注，如文字转录、音素边界、歌词、乐谱与提示文本。为展示MRSAudio的实用性与多功能性，我们建立了五项基础任务：音频空间化、空间文本到语音合成、空间歌声合成、空间音乐生成以及声学事件定位与检测。实验结果表明，MRSAudio能够实现高质量的空间建模，并支持广泛的空间音频研究。演示示例与数据集访问地址为：https://mrsaudio.github.io。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日