将域知识和深层学习相结合,以对社会媒体的短文和非正式讯息进行敏感分析 (Combination of Domain Knowledge and Deep Learning for Sentiment Analysis of Short and Informal Messages on Social Media)

from arxiv, A Preprint of an article accepted for publication by Inderscience in International Journal of Computational Vision and Robotics on September 2018

Sentiment analysis has been emerging recently as one of the major natural language processing (NLP) tasks in many applications. Especially, as social media channels (e.g. social networks or forums) have become significant sources for brands to observe user opinions about their products, this task is thus increasingly crucial. However, when applied with real data obtained from social media, we notice that there is a high volume of short and informal messages posted by users on those channels. This kind of data makes the existing works suffer from many difficulties to handle, especially ones using deep learning approaches. In this paper, we propose an approach to handle this problem. This work is extended from our previous work, in which we proposed to combine the typical deep learning technique of Convolutional Neural Networks with domain knowledge. The combination is used for acquiring additional training data augmentation and a more reasonable loss function. In this work, we further improve our architecture by various substantial enhancements, including negation-based data augmentation, transfer learning for word embeddings, the combination of word-level embeddings and character-level embeddings, and using multitask learning technique for attaching domain knowledge rules in the learning process. Those enhancements, specifically aiming to handle short and informal messages, help us to enjoy significant improvement in performance once experimenting on real datasets.

翻译：最近出现了感官分析,这是许多应用中主要的自然语言处理(NLP)任务之一。特别是,由于社交媒体渠道(例如社交网络或论坛)已成为品牌的重要来源,以观察用户对其产品的意见,因此这项任务越来越重要。然而,在应用从社交媒体获得的真实数据时,我们注意到,用户在这些渠道上张贴了大量短和非正式信息,这种数据使现有工作受到许多处理困难的困扰,特别是使用深层学习方法处理的现有工作。在本文中,我们建议了处理这一问题的方法。这项工作是从我们以前的工作中延伸而来,我们曾提议将动态神经网络典型的深层学习技术与域知识结合起来。这种结合用于获取更多的培训数据增强和更合理的损失功能。在这项工作中,我们通过各种实质性的改进,包括基于否定的数据增强、对文字嵌入的转移学习、将字层嵌入和字符层嵌入结合起来,以及使用多任务学习技术,在学习过程中将域知识规则与域内知识与域知识结合起来。这些改进的具体目的是在学习过程中改进数据。这些改进工作,特别着眼于在实际操作中改进。