用于情感分析的高效地物选择技术 (Efficient Feature Selection techniques for Sentiment Analysis)

Sentiment analysis is a domain of study that focuses on identifying and classifying the ideas expressed in the form of text into positive, negative and neutral polarities. Feature selection is a crucial process in machine learning. In this paper, we aim to study the performance of different feature selection techniques for sentiment analysis. Term Frequency Inverse Document Frequency (TF-IDF) is used as the feature extraction technique for creating feature vocabulary. Various Feature Selection (FS) techniques are experimented to select the best set of features from feature vocabulary. The selected features are trained using different machine learning classifiers Logistic Regression (LR), Support Vector Machines (SVM), Decision Tree (DT) and Naive Bayes (NB). Ensemble techniques Bagging and Random Subspace are applied on classifiers to enhance the performance on sentiment analysis. We show that, when the best FS techniques are trained using ensemble methods achieve remarkable results on sentiment analysis. We also compare the performance of FS methods trained using Bagging, Random Subspace with varied neural network architectures. We show that FS techniques trained using ensemble classifiers outperform neural networks requiring significantly less training time and parameters thereby eliminating the need for extensive hyper-parameter tuning.

翻译：感官分析是一个研究领域,其重点是确定和分类以文字形式表达的想法,将其分为正、负和中两极。特征选择是机器学习中的一个关键过程。在本文件中,我们的目标是研究用于情绪分析的不同特征选择技术的性能。特频反向文档频率(TF-IDF)是用来制作特征词汇的特征提取技术。各种特征选择(FS)技术都实验,以便从特征词汇中选择最佳的一套特征。选定的特征是使用不同的机器学习分类器、支持矢量机器、决定树(DT)和Nive Bayes(NB)来培训的。在分类器中应用嵌套和随机子空间来提高情绪分析的性能。我们表明,在使用混合方法进行最佳FS技术培训时,在情绪分析上取得显著的结果。我们还比较了使用粘贴、随机次等空间培训的FS方法的性能与各种神经网络结构。我们表明,使用感官分类器培训的FSS技术需要超越神经网络,因此不需要大量培训的时间和超时的参数。

相关内容

特征选择

关注 5910

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

【北京大学】动态异构图神经网络建模情感，Jointly Modeling Aspect and Sentiment with Dynamic Heterogeneous Graph Neural Networks

专知会员服务

54+阅读 · 2020年4月15日

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

专知会员服务

64+阅读 · 2020年3月28日