The ability of semantic reasoning over the sentence pair is essential for many natural language understanding tasks, e.g., natural language inference and machine reading comprehension. A recent significant improvement in these tasks comes from BERT. As reported, the next sentence prediction (NSP) in BERT, which learns the contextual relationship between two sentences, is of great significance for downstream problems with sentence-pair input. Despite the effectiveness of NSP, we suggest that NSP still lacks the essential signal to distinguish between entailment and shallow correlation. To remedy this, we propose to augment the NSP task to a 3-class categorization task, which includes a category for previous sentence prediction (PSP). The involvement of PSP encourages the model to focus on the informative semantics to determine the sentence order, thereby improves the ability of semantic understanding. This simple modification yields remarkable improvement against vanilla BERT. To further incorporate the document-level information, the scope of NSP and PSP is expanded into a broader range, i.e., NSP and PSP also include close but nonsuccessive sentences, the noise of which is mitigated by the label-smoothing technique. Both qualitative and quantitative experimental results demonstrate the effectiveness of the proposed method. Our method consistently improves the performance on the NLI and MRC benchmarks, including the challenging HANS dataset \cite{hans}, suggesting that the document-level task is still promising for the pre-training.
翻译:语义推理能力对于许多自然语言理解任务至关重要,例如自然语言推断和机读理解。这些任务最近的一项重大改进来自BERT。如报告所述,BERT的下一句预测(NSP)了解了两句之间的背景关系,对于句面输入的下游问题具有重大意义。尽管NSP的效力,但我们认为,NSP仍然缺乏区分要求和浅度相关性的基本信号。为了纠正这一点,我们提议将NSP的任务扩大为3类分类任务,其中包括前几句预测(PSP)的一个类别。PSP的参与鼓励该模式侧重于信息性语义,以确定句子顺序,从而提高语义理解的能力。这一简单修改使vanilla BERT取得了显著的改进。为了进一步纳入文件级信息,NSP和PSP的范围仍然扩大到更广泛的范围,即NSP和PSP还包括近乎但非超度的句级分类任务,其中包括前期判决(PSP)。PSP)的参与鼓励该模式侧重于确定判决顺序的语义语义,从而提高了NLIS的拟议质量和MA方法的噪音,包括不断展示我们定量和定性方法。