本文为Stanford Dan Jurafsky & Chris Manning: Natural Language Processing 课程笔记。
Sentiment Analysis 有许多别称，如 Opinion extraction/Opinion mining/ Sentiment mining/Subjectivity analysis，都是同一个意思，不过隐含着不同的应用场景。大致来说，情感分析有以下的应用:
Products: 产品评价，不仅仅是简单的好评差评，情感分析还能分析人们对具体产品的具体属性的具体评价，如下图，对 product review 抽aspects/attributes，判断 sentiment，最后 aggregate 得出结果。
Public sentiment: 公众意见(public opinion)，比如说分析消费者信息指数，股票指数等。之前就有人做过用 CALM 来预测道琼斯指数(Bollen et al. 2011 Twitter mood predicts the stock market)，算法也应用到了工业场景。
一个成熟的产品Twitter Sentiment App，能够通过 Twitter 数据来分析人们对某个品牌/话题的情感。
Attitudes: “enduring, affectively colored beliefs, dispositions towards objects or persons”
Sentiment analysis 说白了就是来分析人们对一个事物的态度(attitudes)，包含下面几个元素(以 Mary likes the movie 为例)
Holder (source) of attitude
Target (aspect) of attitude
对象: the movie
Type of attitude
From a set of types: Like, love, hate, value, desire, etc.
Or (more commonly) simple weighted polarity: positive, negative, neutral, together with strength
Text containing the attitude
文本: Mary likes the movie
Sentence or entire document
最简单的情感分析任务，或者说在情感分析方向的 baseline model，是分析/预测电影评论是 positive 还是 negative 的。
常用到的语料库 IMDB Polarity Data 2.0，
目的： polarity detection: is this review positive or negative?
Classification using different classifiers
除了正常 tokenization 要注意的问题如处理 HTML/XML markup 外，情感分析还可能需要处理
twitter markup(hashtag 等)
有用的 Tokenizer 代码
Christopher Potts sentiment tokenizer
Brendan O’Connor twitter tokenizer
didn't like this movie, but I
didn't NOT_like NOT_this NOT_movie but I
Das, Sanjiv and Mike Chen. 2001. Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Proceedings of the Asia Paciﬁc Finance Association Annual Conference (APFA).
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP-2002, 79—86.
Words to use
作为 Baseline model，这里会使用 Naive Bayes，没啥悬念，计算如下
Prior: how likely we see a positive movie review
Likelihood Function: for every review, how likely every word is expressed by a positive movie review
采用 Laplace/Add-one Smoothing
一个变种或者改进版是Binarized(Boolean feature) Multinomial Naive Bayes，它基于这样一个直觉，对情感分析而言，单词是否出现(word occurrence)这个特征比单词出现了几次(word frequency)更为重要，举个例子，出现一次 fantastic 提供了 positive 的信息，而出现 5 次 fantastic 并没有给我们提供更多信息。boolean multinomial Naive Bayes 就是把所有大于 1 的 word counts 压缩为 1。
也有研究认为取中间值 log(freq(w)) 效果更好一些，相关论文如下：
B. Pang, L. Lee, and S. Vaithyanathan. 2002. Thumbs up? Sen+ment Classiﬁcation using Machine Learning Techniques. EMNLP-‐2002, 79—86.
V. Metsis, I. Androutsopoulos, G. Paliouras. 2006. Spam Filtering with Naive Bayes – Which Naive Bayes? CEAS 2006 -‐ Third Conference on Email and Anti‐Spam.
K.-‐M. Schneider. 2004. On word frequency informa+on and negative evidence in Naive Bayes text classiﬁca+on. ICANLP, 474-‐485.
JD Rennie, L Shih, J Teevan. 2003. Tackling the poor assumptions of naive bayes text classiﬁers. ICML 2003
当然在实践中，MaxEnt 和 SVM 的效果要比 Naive Bayes 好的多。
“If you are reading this because it is your darling fragrance, please wear it at home exclusively, and tape the windows shut.”
还有一个问题是排序问题(Order effect)，尽管前面堆砌了很多情感词，但最后来个全盘否定，显然 Naive Bayes 没法处理这种问题
The General Inquirer
List of Categories
LIWC(Linguistic Inquiry and Word Count)
MPQA Subjectivity Cues Lexicon
Bing Liu Opinion Lexicon
看下各个词库的 disagreements between polarity lexicons
那么怎么来分析 IMDB 里每个单词的 polarity 呢？
How likely is each word to appear in each sentiment class?
Make them comparable between words - Scaled likelihood:
更多见 Potts, Christopher. 2011. On the negativity of negation. SALT 20, 636-‐659.
除了目前已有的 lexicon，我们还可以根据自己的语料库来训练自己的 sentiment lexicon。
基于少量的有标注的数据+人工建立的规则，采用 bootstrap 方法来学习 lexicon
论文: Vasileios Hatzivassiloglou and Kathleen R. McKeown. 1997. Predicting the Semantic Orientation of Adjectives. ACL, 174–181
用 AND 连起来的形容词有着相同的 polarity
fair and legitimate, corrupt and brutal
用 BUT 连起来的形容词则相反
fair but brutal
1. 对 1336 个形容词形成的种子集合进行标注，657 个 positive，679 个 negative
2. 通过 google 搜索来查询 conjoined 形容词，eg. “was nice and”
3. Supervised classifier 通过 count(AND), count(BUT) 来给每个词对(word pair)计算 polarity similarity
4. 将 graph 分区
论文: Turney (2002):Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews
1. 从评论中抽取形容词短语(two-word phrase)
2. 学习短语的 polarity
如何衡量短语的 polarity 呢？
Positive phrases co-‐occur more with “excellent”
Negative phrases co-‐occur more with “poor”
用 PMI(Pointwise Mutual Information) 来计算 co-occurrence
Mutual information between 2 random variables X and Y
Pointwise mutual information: how much more do events x and y co-occur than if they were independent
P(word) = hits(word)/N
3. Rate a review by the average polarity of its phrases
一般来说 baseline 的准确率是 59%, Turney algorithm 可以提高到 74%。
S.M. Kim and E. Hovy. 2004. Determining the sentiment of opinions. COLING 2004
M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of KDD, 2004
有一小部分 positive/negative seed-words
从 WordNet 中找到 seed-words 的同义词(synonyms)和反义词(antonyms)
Positive Set: positive words 的同义词 + negative words 的反义词
Negative Set: negative words 的同义词 + positive words 的反义词
重复 2 直到达到终止条件
can be domain-specific
can be more robust(for more/new words)
starts with a seed set of words(good,poor)
find other words that have similar polarity:
• Using “and” and “but”
• Using words that occur nearby in the same document
• Using WordNet synonyms and antonyms
M. Hu and B. Liu. 2004. Mining and summarizing customer reviews. In Proceedings of KDD.
S. Blair-‐Goldensohn, K. Hannan, R. McDonald, T. Neylon, G. Reis, and J. Reynar. 2008. Building a Sen+ment Summarizer for Local Service Reviews. WWW Workshop.
The food was great but the service was awful!
这条评论就是对食物(food)持肯定态度(positive)，对服务(service)持否定态度(negative)，在这种情况下，我们不能简单的对这条评论进行 positive/negative 的分类，而要对其在 food，service 这两个维度上的态度进行分类。food，service 这些维度，或者说 attributes/aspects/target 从哪里来？ 有两种方法，一种是从文本中抽取常用短语+规则来作为 attributes/aspects，另一种是预先定义好 attributes/aspects。
首先找到产品评论里的高频短语，然后按规则进行过滤，可用的规则如找紧跟在 sentiment word 后面的短语，”…great fish tacos” 表示 fish tacos 是一个可能的 aspect。
对一些领域如 restaurants/hotels 来说，aspects 比较规范，所以事实上可以人工给一些产品评论标注 aspect(aspects 如 food, décor, service, value, NONE)，然后再给每个句子/短语分类看它属于哪个 aspect。
值得注意的是，baseline method 的假设是所有类别出现的概率是相同的。如果类别不平衡(在现实中往往如此)，我们不能用 accuracy 来评估，而是需要用 F-scores。而类别不平衡的现象越严重，分类器的表现可能就越差。
Resampling in training
就是说如果 pos 有10^6 条数据，neg 有 10^4 的数据，那么我们都从 10^4 的数据中来划分训练数据
对较少出现的那个类别的 misclassification 加大惩罚(penalize SVM more for misclassification of the rare thing)
论文: Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. ACL, 115–124
Map to binary
压缩到 positive/negative。比如说大于 3.5 的作为 negative，其他作为 positive
Use linear or ordinal regression
or specialized models like metric labeling
通常被建立分类/回归模型来预测 binary/ordinal 类别
对某些任务，在 Naive bayes 里使用所有的词汇表现更好
Hand-built polarity lexicons
Use seeds and semi-supervised learning to induce lexicons
• Detecting annoyed callers to dialogue system
• Detecting confused/frustrated versus conﬁdent students
• Finding traumatized or depressed writers
• Detection of flirtation or friendliness in conversations
• Detection of extroverts
E.g., Detection of Friendliness
Friendly speakers use collaborative conversational style
- Less use of negative emotional words
- More sympathy
That’s too bad I’m sorry to hear that!
- More agreement
I think so too!
- Less hedges
kind of sort of a little …
Aspect level sentiment classification aims to identify the sentiment expressed towards an aspect given a context sentence. Previous neural network based methods largely ignore the syntax structure in one sentence. In this paper, we propose a novel target-dependent graph attention network (TD-GAT) for aspect level sentiment classification, which explicitly utilizes the dependency relationship among words. Using the dependency graph, it propagates sentiment features directly from the syntactic context of an aspect target. In our experiments, we show our method outperforms multiple baselines with GloVe embeddings. We also demonstrate that using BERT representations further substantially boosts the performance.
Aspect-based sentiment analysis (ABSA), which aims to identify fine-grained opinion polarity towards a specific aspect, is a challenging subtask of sentiment analysis (SA). In this paper, we construct an auxiliary sentence from the aspect and convert ABSA to a sentence-pair classification task, such as question answering (QA) and natural language inference (NLI). We fine-tune the pre-trained model from BERT and achieve new state-of-the-art results on SentiHood and SemEval-2014 Task 4 datasets.
Sentiment analysis is a widely studied NLP task where the goal is to determine opinions, emotions, and evaluations of users towards a product, an entity or a service that they are reviewing. One of the biggest challenges for sentiment analysis is that it is highly language dependent. Word embeddings, sentiment lexicons, and even annotated data are language specific. Further, optimizing models for each language is very time consuming and labor intensive especially for recurrent neural network models. From a resource perspective, it is very challenging to collect data for different languages. In this paper, we look for an answer to the following research question: can a sentiment analysis model trained on a language be reused for sentiment analysis in other languages, Russian, Spanish, Turkish, and Dutch, where the data is more limited? Our goal is to build a single model in the language with the largest dataset available for the task, and reuse it for languages that have limited resources. For this purpose, we train a sentiment analysis model using recurrent neural networks with reviews in English. We then translate reviews in other languages and reuse this model to evaluate the sentiments. Experimental results show that our robust approach of single model trained on English reviews statistically significantly outperforms the baselines in several different languages.
We propose a novel approach to multimodal sentiment analysis using deep neural networks combining visual analysis and natural language processing. Our goal is different than the standard sentiment analysis goal of predicting whether a sentence expresses positive or negative sentiment; instead, we aim to infer the latent emotional state of the user. Thus, we focus on predicting the emotion word tags attached by users to their Tumblr posts, treating these as "self-reported emotions." We demonstrate that our multimodal model combining both text and image features outperforms separate models based solely on either images or text. Our model's results are interpretable, automatically yielding sensible word lists associated with emotions. We explore the structure of emotions implied by our model and compare it to what has been posited in the psychology literature, and validate our model on a set of images that have been used in psychology studies. Finally, our work also provides a useful tool for the growing academic study of images - both photographs and memes - on social networks.
Aspect based sentiment analysis (ABSA) can provide more detailed information than general sentiment analysis, because it aims to predict the sentiment polarities of the given aspects or entities in text. We summarize previous approaches into two subtasks: aspect-category sentiment analysis (ACSA) and aspect-term sentiment analysis (ATSA). Most previous approaches employ long short-term memory and attention mechanisms to predict the sentiment polarity of the concerned targets, which are often complicated and need more training time. We propose a model based on convolutional neural networks and gating mechanisms, which is more accurate and efficient. First, the novel Gated Tanh-ReLU Units can selectively output the sentiment features according to the given aspect or entity. The architecture is much simpler than attention layer used in the existing models. Second, the computations of our model could be easily parallelized during training, because convolutional layers do not have time dependency as in LSTM layers, and gating units also work independently. The experiments on SemEval datasets demonstrate the efficiency and effectiveness of our models.
Sentiment analysis is a key component in various text mining applications. Numerous sentiment classification techniques, including conventional and deep learning-based methods, have been proposed in the literature. In most existing methods, a high-quality training set is assumed to be given. Nevertheless, constructing a high-quality training set that consists of highly accurate labels is challenging in real applications. This difficulty stems from the fact that text samples usually contain complex sentiment representations, and their annotation is subjective. We address this challenge in this study by leveraging a new labeling strategy and utilizing a two-level long short-term memory network to construct a sentiment classifier. Lexical cues are useful for sentiment analysis, and they have been utilized in conventional studies. For example, polar and privative words play important roles in sentiment analysis. A new encoding strategy, that is, $\rho$-hot encoding, is proposed to alleviate the drawbacks of one-hot encoding and thus effectively incorporate useful lexical cues. We compile three Chinese data sets on the basis of our label strategy and proposed methodology. Experiments on the three data sets demonstrate that the proposed method outperforms state-of-the-art algorithms.
Sentiment analysis is proven to be very useful tool in many applications regarding social media. This has led to a great surge of research in this field. Hence, in this paper, we compile the baselines for such research. In this paper, we explore three different deep-learning based architectures for multimodal sentiment classification, each improving upon the previous. Further, we evaluate these architectures with multiple datasets with fixed train/test partition. We also discuss some major issues, frequently ignored in multimodal sentiment analysis research, e.g., role of speaker-exclusive models, importance of different modalities, and generalizability. This framework illustrates the different facets of analysis to be considered while performing multimodal sentiment analysis and, hence, serves as a new benchmark for future research in this emerging field. We draw a comparison among the methods using empirical data, obtained from the experiments. In the future, we plan to focus on extracting semantics from visual features, cross-modal features and fusion.
Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. Along with the success of deep learning in many other application domains, deep learning is also popularly used in sentiment analysis in recent years. This paper first gives an overview of deep learning and then provides a comprehensive survey of its current applications in sentiment analysis.
Sentiment Analysis (SA) is a major field of study in natural language processing, computational linguistics and information retrieval. Interest in SA has been constantly growing in both academia and industry over the recent years. Moreover, there is an increasing need for generating appropriate resources and datasets in particular for low resource languages including Persian. These datasets play an important role in designing and developing appropriate opinion mining platforms using supervised, semi-supervised or unsupervised methods. In this paper, we outline the entire process of developing a manually annotated sentiment corpus, SentiPers, which covers formal and informal written contemporary Persian. To the best of our knowledge, SentiPers is a unique sentiment corpus with such a rich annotation in three different levels including document-level, sentence-level, and entity/aspect-level for Persian. The corpus contains more than 26000 sentences of users opinions from digital product domain and benefits from special characteristics such as quantifying the positiveness or negativity of an opinion through assigning a number within a specific range to any given sentence. Furthermore, we present statistics on various components of our corpus as well as studying the inter-annotator agreement among the annotators. Finally, some of the challenges that we faced during the annotation process will be discussed as well.
This project addresses the problem of sentiment analysis in twitter; that is classifying tweets according to the sentiment expressed in them: positive, negative or neutral. Twitter is an online micro-blogging and social-networking platform which allows users to write short status updates of maximum length 140 characters. It is a rapidly expanding service with over 200 million registered users - out of which 100 million are active users and half of them log on twitter on a daily basis - generating nearly 250 million tweets per day. Due to this large amount of usage we hope to achieve a reflection of public sentiment by analysing the sentiments expressed in the tweets. Analysing the public sentiment is important for many applications such as firms trying to find out the response of their products in the market, predicting political elections and predicting socioeconomic phenomena like stock exchange. The aim of this project is to develop a functional classifier for accurate and automatic sentiment classification of an unknown tweet stream.