Generating coherent and cohesive long-form texts is a challenging problem in natural language generation. Previous works relied on a large amount of human-generated texts to train language models, however, few attempted to explicitly model the desired linguistic properties of natural language text, such as coherence and cohesion. In this work, we train two expert discriminators for coherence and cohesion, respectively, to provide hierarchical feedback for text generation. We also propose a simple variant of policy gradient, called 'negative-critical sequence training', using margin rewards, in which the 'baseline' is constructed from randomly generated negative samples. We demonstrate the effectiveness of our approach through empirical studies, showing significant improvements over the strong baseline -- attention-based bidirectional MLE-trained neural language model -- in a number of automated metrics. The proposed discriminators can serve as baseline architectures to promote further research to better extract, encode essential linguistic qualities, such as coherence and cohesion.
翻译:在自然语言生成过程中,形成一致和一致的长式文本是一个具有挑战性的问题。以前的工作依靠大量人为生成的文本来培训语言模型,然而,很少有人试图明确模拟自然语言文本的预期语言特性,如一致性和凝聚力。在这项工作中,我们分别培训了两名专家歧视者,以提供一致性和凝聚力,为文本生成提供分级反馈。我们还提出了一个简单的政策梯度变式,即使用差值奖励,称为“负临界序列培训”,其中“基线”是从随机生成的负面样本中构建的。我们通过经验研究展示了我们的方法的有效性,展示了在一系列自动化指标中强有力的基线 -- -- 基于关注的双向MLE培训的神经语言模型 -- 上的显著改进。拟议的歧视者可以作为基线结构,推动进一步研究,以更好地提取、编码基本的语言质量,例如一致性和凝聚力。