Conversational systems are of primary interest in the AI community. Chatbots are increasingly being deployed to provide round-the-clock support and to increase customer engagement. Many of the commercial bot building frameworks follow a standard approach that requires one to build and train an intent model to recognize a user input. Intent models are trained in a supervised setting with a collection of textual utterance and intent label pairs. Gathering a substantial and wide coverage of training data for different intent is a bottleneck in the bot building process. Moreover, the cost of labeling a hundred to thousands of conversations with intent is a time consuming and laborious job. In this paper, we present an intent discovery framework that involves 4 primary steps: Extraction of textual utterances from a conversation using a pre-trained domain agnostic Dialog Act Classifier (Data Extraction), automatic clustering of similar user utterances (Clustering), manual annotation of clusters with an intent label (Labeling) and propagation of intent labels to the utterances from the previous step, which are not mapped to any cluster (Label Propagation); to generate intent training data from raw conversations. We have introduced a novel density-based clustering algorithm ITER-DBSCAN for unbalanced data clustering. Subject Matter Expert (Annotators with domain expertise) manually looks into the clustered user utterances and provides an intent label for discovery. We conducted user studies to validate the effectiveness of the trained intent model generated in terms of coverage of intents, accuracy and time saving concerning manual annotation. Although the system is developed for building an intent model for the conversational system, this framework can also be used for a short text clustering or as a labeling framework.


翻译:互换系统是AI 社区的主要兴趣所在。 聊天室正在越来越多地部署, 以提供全天候支持和增加客户参与。 许多商用机器人建筑框架都遵循标准方法, 要求建立和训练一种识别用户投入的意向模型。 意向模型在监督环境下经过培训, 收集了文本表达和意向标签配对。 为不同目的收集大量和广泛的培训数据是机器人建设过程中的一个瓶颈。 此外, 标注成百上千次有意对话的成本是耗时和艰苦的专长。 在本文中, 我们展示了一个意向发现框架, 包括4个主要步骤: 使用预先培训过的域域描述分析法案分类(DataGripationon), 将类似的用户表达(Clustering)自动组合, 以模型标签( Weabiling) 和意图标签框架传播到前一步骤的口述, 用于构建任何意向的意向准确度(Label Propaggation), 我们展示了一个用于数据库的系统, 将目标数据升级的版本用于数据库。

0
下载
关闭预览

相关内容

【干货书】真实机器学习,264页pdf,Real-World Machine Learning
Stabilizing Transformers for Reinforcement Learning
专知会员服务
57+阅读 · 2019年10月17日
强化学习最新教程,17页pdf
专知会员服务
167+阅读 · 2019年10月11日
Transferring Knowledge across Learning Processes
CreateAMind
25+阅读 · 2019年5月18日
Unsupervised Learning via Meta-Learning
CreateAMind
41+阅读 · 2019年1月3日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
16+阅读 · 2018年12月24日
已删除
将门创投
4+阅读 · 2018年1月19日
Towards Topic-Guided Conversational Recommender System
VIP会员
相关VIP内容
【干货书】真实机器学习,264页pdf,Real-World Machine Learning
Stabilizing Transformers for Reinforcement Learning
专知会员服务
57+阅读 · 2019年10月17日
强化学习最新教程,17页pdf
专知会员服务
167+阅读 · 2019年10月11日
相关资讯
Transferring Knowledge across Learning Processes
CreateAMind
25+阅读 · 2019年5月18日
Unsupervised Learning via Meta-Learning
CreateAMind
41+阅读 · 2019年1月3日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
16+阅读 · 2018年12月24日
已删除
将门创投
4+阅读 · 2018年1月19日
Top
微信扫码咨询专知VIP会员