Text classification is a fundamental task in natural language processing (NLP). Several recent studies show the success of deep learning on text processing. Convolutional neural network (CNN), as a popular deep learning model, has shown remarkable success in the task of text classification. In this paper, new baseline models have been studied for text classification using CNN. In these models, documents are fed to the network as a three-dimensional tensor representation to provide sentence-level analysis. Applying such a method enables the models to take advantage of the positional information of the sentences in the text. Besides, analysing adjacent sentences allows extracting additional features. The proposed models have been compared with the state-of-the-art models using several datasets. The results have shown that the proposed models have better performance, particularly in the longer documents.
翻译:文本分类是自然语言处理(NLP)的一项基本任务。最近的一些研究显示,在文本处理方面深层次学习取得了成功。作为一个广受欢迎的深层次学习模式,进化神经网络(CNN)在文本分类任务中表现出了显著的成功。在本文件中,为使用CNN进行文本分类研究了新的基线模型。在这些模型中,将文件作为三维分数向网络输入,以提供判决级分析。应用这种方法使模型能够利用文本中句子的位置信息。此外,分析相邻的句子可以提取更多的特征。拟议的模型与使用若干数据集的最新模型进行了比较。结果显示,拟议的模型表现更好,特别是在较长的文档中。