We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including neural network and template-based models. By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble. The system has been evaluated through A/B testing with real-world users, where it performed significantly better than other systems. The results highlight the potential of coupling ensemble systems with deep reinforcement learning as a fruitful path for developing real-world, open-domain conversational agents.

10
下载
关闭预览

相关内容

深度强化学习 (DRL) 是一种使用深度学习技术扩展传统强化学习方法的一种机器学习方法。 传统强化学习方法的主要任务是使得主体根据从环境中获得的奖赏能够学习到最大化奖赏的行为。然而,传统无模型强化学习方法需要使用函数逼近技术使得主体能够学习出值函数或者策略。在这种情况下,深度学习强大的函数逼近能力自然成为了替代人工指定特征的最好手段并为性能更好的端到端学习的实现提供了可能。

Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revolutionized the field of computer vision especially with the advent of novel deeper architectures such as Residual and Convolutional Neural Networks. Apart from images, sequential data such as text and audio can also be processed with DNNs to reach state-of-the-art performance for document classification and speech recognition. In this article, we study the current state-of-the-art performance of deep learning algorithms for TSC by presenting an empirical study of the most recent DNN architectures for TSC. We give an overview of the most successful deep learning applications in various time series domains under a unified taxonomy of DNNs for TSC. We also provide an open source deep learning framework to the TSC community where we implemented each of the compared approaches and evaluated them on a univariate TSC benchmark (the UCR/UEA archive) and 12 multivariate time series datasets. By training 8,730 deep learning models on 97 time series datasets, we propose the most exhaustive study of DNNs for TSC to date.

0
8
下载
预览

The majority of conversations a dialogue agent sees over its lifetime occur after it has already been trained and deployed, leaving a vast store of potential training signal untapped. In this work, we propose the self-feeding chatbot, a dialogue agent with the ability to extract new training examples from the conversations it participates in. As our agent engages in conversation, it also estimates user satisfaction in its responses. When the conversation appears to be going well, the user's responses become new training examples to imitate. When the agent believes it has made a mistake, it asks for feedback; learning to predict the feedback that will be given improves the chatbot's dialogue abilities further. On the PersonaChat chit-chat dataset with over 131k training examples, we find that learning from dialogue with a self-feeding chatbot significantly improves performance, regardless of the amount of traditional supervision.

0
4
下载
预览

This paper presents a novel approach for synthesizing automatically age-progressed facial images in video sequences using Deep Reinforcement Learning. The proposed method models facial structures and the longitudinal face-aging process of given subjects coherently across video frames. The approach is optimized using a long-term reward, Reinforcement Learning function with deep feature extraction from Deep Convolutional Neural Network. Unlike previous age-progression methods that are only able to synthesize an aged likeness of a face from a single input image, the proposed approach is capable of age-progressing facial likenesses in videos with consistently synthesized facial features across frames. In addition, the deep reinforcement learning method guarantees preservation of the visual identity of input faces after age-progression. Results on videos of our new collected aging face AGFW-v2 database demonstrate the advantages of the proposed solution in terms of both quality of age-progressed faces, temporal smoothness, and cross-age face verification.

0
3
下载
预览

Automatic generation of paraphrases from a given sentence is an important yet challenging task in natural language processing (NLP), and plays a key role in a number of applications such as question answering, search, and dialogue. In this paper, we present a deep reinforcement learning approach to paraphrase generation. Specifically, we propose a new framework for the task, which consists of a \textit{generator} and an \textit{evaluator}, both of which are learned from data. The generator, built as a sequence-to-sequence learning model, can produce paraphrases given a sentence. The evaluator, constructed as a deep matching model, can judge whether two sentences are paraphrases of each other. The generator is first trained by deep learning and then further fine-tuned by reinforcement learning in which the reward is given by the evaluator. For the learning of the evaluator, we propose two methods based on supervised learning and inverse reinforcement learning respectively, depending on the type of available training data. Empirical study shows that the learned evaluator can guide the generator to produce more accurate paraphrases. Experimental results demonstrate the proposed models (the generators) outperform the state-of-the-art methods in paraphrase generation in both automatic evaluation and human evaluation.

0
3
下载
预览

This paper presents a new multi-objective deep reinforcement learning (MODRL) framework based on deep Q-networks. We propose the use of linear and non-linear methods to develop the MODRL framework that includes both single-policy and multi-policy strategies. The experimental results on two benchmark problems including the two-objective deep sea treasure environment and the three-objective mountain car problem indicate that the proposed framework is able to converge to the optimal Pareto solutions effectively. The proposed framework is generic, which allows implementation of different deep reinforcement learning algorithms in different complex environments. This therefore overcomes many difficulties involved with standard multi-objective reinforcement learning (MORL) methods existing in the current literature. The framework creates a platform as a testbed environment to develop methods for solving various problems associated with the current MORL. Details of the framework implementation can be referred to http://www.deakin.edu.au/~thanhthi/drl.htm.

0
9
下载
预览

Accelerated by the tremendous increase in Internet bandwidth and storage space, video data has been generated, published and spread explosively, becoming an indispensable part of today's big data. In this paper, we focus on reviewing two lines of research aiming to stimulate the comprehension of videos with deep learning: video classification and video captioning. While video classification concentrates on automatically labeling video clips based on their semantic contents like human actions or complex events, video captioning attempts to generate a complete and natural sentence, enriching the single label as in video classification, to capture the most informative dynamics in videos. In addition, we also provide a review of popular benchmarks and competitions, which are critical for evaluating the technical progress of this vibrant field.

0
8
下载
预览

Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedbacks. In particular, we introduce an online user-agent interacting environment simulator, which can pre-train and evaluate model parameters offline before applying the model online. Moreover, we validate the importance of list-wise recommendations during the interactions between users and agent, and develop a novel approach to incorporate them into the proposed framework LIRD for list-wide recommendations. The experimental results based on a real-world e-commerce dataset demonstrate the effectiveness of the proposed framework.

0
12
下载
预览

With the ever-growing volume, complexity and dynamicity of online information, recommender system has been an effective key solution to overcome such information overload. In recent years, deep learning's revolutionary advances in speech recognition, image analysis and natural language processing have gained significant attention. Meanwhile, recent studies also demonstrate its effectiveness in coping with information retrieval and recommendation tasks. Applying deep learning techniques into recommender system has been gaining momentum due to its state-of-the-art performances and high-quality recommendations. In contrast to traditional recommendation models, deep learning provides a better understanding of user's demands, item's characteristics and historical interactions between them. This article aims to provide a comprehensive review of recent research efforts on deep learning based recommender systems towards fostering innovations of recommender system research. A taxonomy of deep learning based recommendation models is presented and used to categorize the surveyed articles. Open problems are identified based on the analytics of the reviewed works and potential solutions discussed.

0
4
下载
预览
小贴士
相关论文
H. Ismail Fawaz,G. Forestier,J. Weber,L. Idoumghar,P. Muller
8+阅读 · 2019年3月14日
Learning from Dialogue after Deployment: Feed Yourself, Chatbot!
Braden Hancock,Antoine Bordes,Pierre-Emmanuel Mazare,Jason Weston
4+阅读 · 2019年1月16日
Chi Nhan Duong,Khoa Luu,Kha Gia Quach,Nghia Nguyen,Eric Patterson,Tien D. Bui,Ngan Le
3+阅读 · 2018年11月27日
Learning to Coordinate Multiple Reinforcement Learning Agents for Diverse Query Reformulation
Rodrigo Nogueira,Jannis Bulian,Massimiliano Ciaramita
3+阅读 · 2018年9月27日
Paraphrase Generation with Deep Reinforcement Learning
Zichao Li,Xin Jiang,Lifeng Shang,Hang Li
3+阅读 · 2018年8月23日
A Multi-Objective Deep Reinforcement Learning Framework
Thanh Thi Nguyen
9+阅读 · 2018年6月27日
Marc Brittain,Peng Wei
3+阅读 · 2018年5月18日
Zuxuan Wu,Ting Yao,Yanwei Fu,Yu-Gang Jiang
8+阅读 · 2018年2月22日
Xiangyu Zhao,Liang Zhang,Zhuoye Ding,Dawei Yin,Yihong Zhao,Jiliang Tang
12+阅读 · 2018年1月5日
Shuai Zhang,Lina Yao,Aixin Sun
4+阅读 · 2017年8月3日
相关资讯
Transferring Knowledge across Learning Processes
CreateAMind
6+阅读 · 2019年5月18日
Unsupervised Learning via Meta-Learning
CreateAMind
26+阅读 · 2019年1月3日
RL 真经
CreateAMind
4+阅读 · 2018年12月28日
Hierarchical Imitation - Reinforcement Learning
CreateAMind
15+阅读 · 2018年5月25日
Reinforcement Learning: An Introduction 2018第二版 500页
CreateAMind
9+阅读 · 2018年4月27日
推荐|深度强化学习聊天机器人(附论文)!
全球人工智能
4+阅读 · 2018年1月30日
【推荐】直接未来预测:增强学习监督学习
机器学习研究会
6+阅读 · 2017年11月24日
Deep Reinforcement Learning 深度增强学习资源
数据挖掘入门与实战
5+阅读 · 2017年11月4日
Top