Argument mining is often addressed by a pipeline method where segmentation of text into argumentative units is conducted first and proceeded by an argument component identification task. In this research, we apply a token-level classification to identify claim and premise tokens from a new corpus of argumentative essays written by middle school students. To this end, we compare a variety of state-of-the-art models such as discrete features and deep learning architectures (e.g., BiLSTM networks and BERT-based architectures) to identify the argument components. We demonstrate that a BERT-based multi-task learning architecture (i.e., token and sentence level classification) adaptively pretrained on a relevant unlabeled dataset obtains the best results
翻译:在这项研究中,我们采用象征性的分类,从中学生撰写的一套新的争论论文中找出索赔和前置符号。为此,我们比较了各种最先进的模型,如离散特征和深层学习结构(如BILSTM网络和BERT基础建筑),以确定这些参数组成部分。我们证明,基于BERT的多任务学习结构(即象征性和句级分类)在相关无标签数据集上经过适应性地预先培训,取得了最佳结果。