NLU++: 面向任务的对话中通用的用于理解自然语言的数据集 (NLU++: A Multi-Label, Slot-Rich, Generalisable Dataset for Natural Language Understanding in Task-Oriented Dialogue) - 专知论文

会员服务 ·

0

NLU · 任务对话系统 · 可理解性 · 数据集 · 讲稿 ·

2022 年 4 月 28 日

NLU++: A Multi-Label, Slot-Rich, Generalisable Dataset for Natural Language Understanding in Task-Oriented Dialogue

翻译：NLU++: 面向任务的对话中通用的用于理解自然语言的数据集

Iñigo Casanueva,Ivan Vulić,Georgios Spithourakis,Paweł Budzianowski

from arxiv, 16 pages, 1 figure, 10 tables. Accepted in NAACL 2022 (Findings)

We present NLU++, a novel dataset for natural language understanding (NLU) in task-oriented dialogue (ToD) systems, with the aim to provide a much more challenging evaluation environment for dialogue NLU models, up to date with the current application and industry requirements. NLU++ is divided into two domains (BANKING and HOTELS) and brings several crucial improvements over current commonly used NLU datasets. 1) NLU++ provides fine-grained domain ontologies with a large set of challenging multi-intent sentences, introducing and validating the idea of intent modules that can be combined into complex intents that convey complex user goals, combined with finer-grained and thus more challenging slot sets. 2) The ontology is divided into domain-specific and generic (i.e., domain-universal) intent modules that overlap across domains, promoting cross-domain reusability of annotated examples. 3) The dataset design has been inspired by the problems observed in industrial ToD systems, and 4) it has been collected, filtered and carefully annotated by dialogue NLU experts, yielding high-quality annotated data. Finally, we benchmark a series of current state-of-the-art NLU models on NLU++; the results demonstrate the challenging nature of the dataset, especially in low-data regimes, the validity of `intent modularisation', and call for further research on ToD NLU.

翻译：我们提出NLU++,这是在以任务为导向的对话系统中自然语言理解(NLU)的新数据集,目的是为对话NLU模式提供一个更具挑战性的评价环境,与当前应用和行业要求相适应。 NLU++分为两个领域(Banking和HOTLES),对目前常用的NLU数据集带来若干重大改进。 1 NLU++为当前通用的NLU数据集提供了细微的域域名,并配有一套具有挑战性、多重意图的句子。引入并验证了意向模块的想法,这些模块可以合并为传递复杂的用户目标的复杂意图,加上精细的、因而更具挑战性的时档组合。 2) 主题组分为具体领域和通用(即广域通用)的意向模块,促进附加说明的示例的跨区域重复性。 3)数据集的设计受到工业多用途系统所观察到的问题的启发,4)它已经进一步收集、过滤和仔细补充了NLU专家在NL-L标准化方面进行的对话,特别是具有挑战性的当前标准性的数据系列。

0

相关内容

NLU

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

基于增益/损耗调制的波导中光传输动力学的特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

谐振器中的超导量子比特和微机械振子耦合系统

国家自然科学基金

0+阅读 · 2014年12月31日

运动宽带目标的波达方向估计研究

国家自然科学基金

0+阅读 · 2013年12月31日

高温颗粒流绕流换热管束的流动与传热特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

TGF-β1/miR-411/MAPK信号通路参与调控横纹肌肉瘤肌分化调节因子MyoD功能的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Importinβ2 介导少突胶质细胞转录因子Olig1核浆转位的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

RI与Angiogenin相互作用调控PI3K/AKT/mTOR信号通路和ANG的核转位在膀胱癌发生发展中的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于metamaterial电磁响应的高电磁场增强、高稳定性和重现性表面增强拉曼散射衬底研究

国家自然科学基金

0+阅读 · 2009年12月31日

EPO抑制创伤性脑水肿的分子机制

国家自然科学基金

0+阅读 · 2008年12月31日

Vakyansh: ASR Toolkit for Low Resource Indic languages

Vakyansh: ASR Toolkit for Low Resource Indic languages

Arxiv

0+阅读 · 2022年6月15日

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

Arxiv

0+阅读 · 2022年6月15日

PET: An Annotated Dataset for Process Extraction from Natural Language Text

Arxiv

0+阅读 · 2022年6月13日

A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction

A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction

Arxiv

0+阅读 · 2022年6月10日

MTG: A Benchmark Suite for Multilingual Text Generation

Arxiv

0+阅读 · 2022年6月10日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

Paradigm Shift in Natural Language Processing

Arxiv

28+阅读 · 2021年9月26日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

VIP会员

文章信息

相关主题

任务对话系统

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新型数字杀伤链：理解综合战术网络对野战炮兵体系的能力与效益

《对抗环境中运用数字孪生技术优化预测性维护与后勤保障》2025最新93页

《任务式指挥十六个案例研究》232页

《幻觉还是事实：国防大型语言模型的可信度评估研究》2025最新109页

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Vakyansh: ASR Toolkit for Low Resource Indic languages

Vakyansh: ASR Toolkit for Low Resource Indic languages

Arxiv

0+阅读 · 2022年6月15日

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

Arxiv

0+阅读 · 2022年6月15日

PET: An Annotated Dataset for Process Extraction from Natural Language Text

Arxiv

0+阅读 · 2022年6月13日

A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction

A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction

Arxiv

0+阅读 · 2022年6月10日

MTG: A Benchmark Suite for Multilingual Text Generation

Arxiv

0+阅读 · 2022年6月10日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

Paradigm Shift in Natural Language Processing

Arxiv

28+阅读 · 2021年9月26日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

相关基金

基于增益/损耗调制的波导中光传输动力学的特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

谐振器中的超导量子比特和微机械振子耦合系统

国家自然科学基金

0+阅读 · 2014年12月31日

运动宽带目标的波达方向估计研究

国家自然科学基金

0+阅读 · 2013年12月31日

高温颗粒流绕流换热管束的流动与传热特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

TGF-β1/miR-411/MAPK信号通路参与调控横纹肌肉瘤肌分化调节因子MyoD功能的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Importinβ2 介导少突胶质细胞转录因子Olig1核浆转位的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

RI与Angiogenin相互作用调控PI3K/AKT/mTOR信号通路和ANG的核转位在膀胱癌发生发展中的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于metamaterial电磁响应的高电磁场增强、高稳定性和重现性表面增强拉曼散射衬底研究

国家自然科学基金

0+阅读 · 2009年12月31日

EPO抑制创伤性脑水肿的分子机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员