CommonVoice-SpeechRE与RPG-MoGe：通过新数据集与多序生成框架推进语音关系抽取研究 (CommonVoice-SpeechRE and RPG-MoGe: Advancing Speech Relation Extraction with a New Dataset and Multi-Order Generative Framework)

Speech Relation Extraction (SpeechRE) aims to extract relation triplets directly from speech. However, existing benchmark datasets rely heavily on synthetic data, lacking sufficient quantity and diversity of real human speech. Moreover, existing models also suffer from rigid single-order generation templates and weak semantic alignment, substantially limiting their performance. To address these challenges, we introduce CommonVoice-SpeechRE, a large-scale dataset comprising nearly 20,000 real-human speech samples from diverse speakers, establishing a new benchmark for SpeechRE research. Furthermore, we propose the Relation Prompt-Guided Multi-Order Generative Ensemble (RPG-MoGe), a novel framework that features: (1) a multi-order triplet generation ensemble strategy, leveraging data diversity through diverse element orders during both training and inference, and (2) CNN-based latent relation prediction heads that generate explicit relation prompts to guide cross-modal alignment and accurate triplet generation. Experiments show our approach outperforms state-of-the-art methods, providing both a benchmark dataset and an effective solution for real-world SpeechRE. The source code and dataset are publicly available at https://github.com/NingJinzhong/SpeechRE_RPG_MoGe.

翻译：语音关系抽取（SpeechRE）旨在直接从语音中提取关系三元组。然而，现有基准数据集严重依赖合成数据，缺乏足够数量与多样性的真实人类语音。此外，现有模型还受限于僵化的单序生成模板与薄弱的语义对齐能力，显著限制了其性能。为应对这些挑战，我们提出了CommonVoice-SpeechRE——一个包含近20,000个来自多样化说话者的真实人类语音样本的大规模数据集，为SpeechRE研究建立了新的基准。进一步地，我们提出了关系提示引导的多序生成集成框架（RPG-MoGe），该创新框架具备以下特征：（1）多序三元组生成集成策略，通过在训练与推理阶段利用多样化元素顺序来挖掘数据多样性；（2）基于CNN的潜在关系预测头，可生成显式关系提示以指导跨模态对齐与精确的三元组生成。实验表明，我们的方法超越了现有最先进技术，为实际场景中的SpeechRE提供了基准数据集与有效解决方案。源代码与数据集已公开于https://github.com/NingJinzhong/SpeechRE_RPG_MoGe。

相关内容

数据集

关注 0

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日