使配有三角分解协定的端对端语音译文正规化 (Regularizing End-to-End Speech Translation with Triangular Decomposition Agreement)

End-to-end speech-to-text translation~(E2E-ST) is becoming increasingly popular due to the potential of its less error propagation, lower latency, and fewer parameters. Given the triplet training corpus $\langle speech, transcription, translation\rangle$, the conventional high-quality E2E-ST system leverages the $\langle speech, transcription\rangle$ pair to pre-train the model and then utilizes the $\langle speech, translation\rangle$ pair to optimize it further. However, this process only involves two-tuple data at each stage, and this loose coupling fails to fully exploit the association between triplet data. In this paper, we attempt to model the joint probability of transcription and translation based on the speech input to directly leverage such triplet data. Based on that, we propose a novel regularization method for model training to improve the agreement of dual-path decomposition within triplet data, which should be equal in theory. To achieve this goal, we introduce two Kullback-Leibler divergence regularization terms into the model training objective to reduce the mismatch between output probabilities of dual-path. Then the well-trained model can be naturally transformed as the E2E-ST models by the pre-defined early stop tag. Experiments on the MuST-C benchmark demonstrate that our proposed approach significantly outperforms state-of-the-art E2E-ST baselines on all 8 language pairs, while achieving better performance in the automatic speech recognition task. Our code is open-sourced at https://github.com/duyichao/E2E-ST-TDA.

翻译：端对端语音对文本翻译 ~( E2E- ST) 正在变得日益受欢迎, 原因是其传播错误较少、延迟度较低、参数更少。鉴于三重培训堆 $\ langle 语音、转录、翻译\ rangle$, 常规高品质 E2E- ST 系统利用 $\ langle 语音、转录\ rangle$ 来预演模型, 然后使用 $\ langle 语音、翻译\ rangle $ 配对来进一步优化模型。然而, 这一过程仅涉及每个阶段的双调数据, 而这种松散的组合无法充分利用三重数据之间的关联。在本文中, 我们试图根据语音投入来模拟转录入和翻译的概率联合概率, 直接利用这种三重数据。在此基础上, 我们提出了一种新型培训的规范化方法, 来改进三重数据内部双向解析式解析, 这在理论上都是相同的。为了实现这一目标, 我们引入了两个 Kullback- Leblever 校正校正校正校正校正校正校正校正校正校正校正校正校正校正校正校正校正的校正校正校正校正校正校正校正的校正校正校正校正校正校正的校正校正校对术语校正的校对术语校正的校对术语校对条件, 在模型中, 在模型的模型的模型的模型的校正的模型中, 以更好的校正校正校正校正校正校正校正的校正的校对可以大大在EST ST ST ST ST的校对中, 。