窥视未来：面向上下文偏置的语音识别方法 (Peeking Into The Future For Contextual Biasing)

While end-to-end (E2E) automatic speech recognition (ASR) models excel at general transcription, they struggle to recognize rare or unseen named entities (e.g., contact names, locations), which are critical for downstream applications like virtual assistants. In this paper, we propose a contextual biasing method for attention based encoder decoder (AED) models using a list of candidate named entities. Instead of predicting only the next token, we simultaneously predict multiple future tokens, enabling the model to "peek into the future" and score potential candidate entities in the entity list. Moreover, our approach leverages the multi-token prediction logits directly without requiring additional entity encoders or cross-attention layers, significantly reducing architectural complexity. Experiments on Librispeech demonstrate that our approach achieves up to 50.34% relative improvement in named entity word error rate compared to the baseline AED model.

翻译：尽管端到端（E2E）自动语音识别（ASR）模型在通用转录任务上表现出色，但在识别罕见或未见过的命名实体（如联系人姓名、地点）时仍存在困难，而这些实体对于虚拟助手等下游应用至关重要。本文提出了一种基于注意力机制的编码器-解码器（AED）模型的上下文偏置方法，该方法利用候选命名实体列表进行优化。与仅预测下一个标记的传统方法不同，我们同时预测多个未来标记，使模型能够“窥视未来”并对实体列表中的潜在候选实体进行评分。此外，我们的方法直接利用多标记预测的对数概率，无需额外的实体编码器或交叉注意力层，从而显著降低了架构复杂度。在Librispeech数据集上的实验表明，与基线AED模型相比，该方法在命名实体词错误率上实现了最高50.34%的相对提升。

相关内容

实体

关注 12

实体（entity）是有可区别性且独立存在的某种事物，但它不需要是物质上的存在。尤其是抽象和法律拟制也通常被视为实体。实体可被看成是一包含有子集的集合。在哲学里，这种集合被称为客体。实体可被使用来指涉某个可能是人、动物、植物或真菌等不会思考的生命、无生命物体或信念等的事物。在这一方面，实体可以被视为一全包的词语。有时，实体被当做本质的广义，不论即指的是否为物质上的存在，如时常会指涉到的无物质形式的实体－语言。更有甚者，实体有时亦指存在或本质本身。在法律上，实体是指能具有权利和义务的事物。这通常是指法人，但也包括自然人。

【CVPR 2022】基于Transformer的图象风格化，StyTr2: Image Style Transfer with Transformers

专知会员服务

11+阅读 · 2022年3月19日

【CVPR 2022】基于视觉-语言验证和迭代推理的视觉定位,Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

专知会员服务

12+阅读 · 2022年3月19日

【CVPR 2022】MixFormer：跨窗口与维度的特征融合，MixFormer: Mixing Features across Windows and Dimensions

专知会员服务

15+阅读 · 2022年3月19日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日