The goal of building intelligent dialogue systems has largely been separately pursued under two motives: task-oriented dialogue (TOD) systems, and open-domain systems for chit-chat (CC). Although previous TOD dialogue systems work well in the testing sets of benchmarks, they would lead to undesirable failure when being exposed to natural scenarios in practice, where user utterances can be of high motive-diversity that fusing both TOD and CC in multi-turn interaction. Since an industrial TOD system should be able to converse with the user between TOD and CC motives, constructing a fuse-motive dialogue dataset that contains both TOD or CC is important. Most prior work relies on crowd workers to collect and annotate large scale dataset and is restricted to English language setting. Our work, on the contrary, addresses this problem in a more effective way and releases a multi-turn dialogues dataset called CCET (Chinese Chat-Enhanced-Task). Meanwhile, we also propose a line of fuse-motive dialogues formalization approach, along with several evaluation metrics for TOD sessions that are integrated by CC utterances.
翻译:建立智能对话系统的目标在很大程度上是在两个动机下分别追求的:以任务为导向的对话系统(TOD)和用于chit-chat(CC)的开放域系统。虽然以前的TOD对话系统在测试基准组中运作良好,但如果在实际中遇到自然情景时,它们将导致不可取的失败,因为用户的言论可能具有高度的动机多样性,在多端互动中既能将TOD又能将TOD和CC联系起来。由于工业TOD系统应该能够与用户在TOD和CC动机之间进行对调,建立一个包含TOD或CC的导火线对话数据集非常重要。大多数先前的工作都依靠人群工人收集和注解大规模数据集,并仅限于英语设置。相反,我们的工作以更有效的方式解决这一问题,并释放出一个称为CCET的多方向对话数据集(中文聊天-恩汉德-塔斯克 ) 。 与此同时,我们还提议采用一套引信对话正规化方法,同时为TOD会议制定若干评价指标,这些指标由CC词集成。