具有语言限制的可控文本生成 (Controllable Text Generation with Language Constraints)

We consider the task of text generation in language models with constraints specified in natural language. To this end, we first create a challenging benchmark Cognac that provides as input to the model a topic with example text, along with a constraint on text to be avoided. Unlike prior work, our benchmark contains knowledge-intensive constraints sourced from databases like Wordnet and Wikidata, which allows for straightforward evaluation while striking a balance between broad attribute-level and narrow lexical-level controls. We find that even state-of-the-art language models like GPT-3 fail often on this task, and propose a solution to leverage a language model's own internal knowledge to guide generation. Our method, called CognacGen, first queries the language model to generate guidance terms for a specified topic or constraint, and uses the guidance to modify the model's token generation probabilities. We propose three forms of guidance (binary verifier, top-k tokens, textual example), and employ prefix-tuning approaches to distill the guidance to tackle diverse natural language constraints. Through extensive empirical evaluations, we demonstrate that CognacGen can successfully generalize to unseen instructions and outperform competitive baselines in generating constraint conforming text.

翻译：我们考虑的是语言模式中的文本生成任务,这些语言有自然语言的制约。为此,我们首先制定具有挑战性的基准科尼亚克,作为该模式的一个主题的投入,并附有示范文本,同时对文本加以避免。与先前的工作不同,我们的基准包含来自Wordnet和Wikigata等数据库的知识密集型限制,这些限制允许直接评估,同时在广泛的属性层面和狭义的词汇级控制之间取得平衡。我们发现,甚至像GPT-3这样的最先进的语言模式也常常无法完成这项任务,我们提出一种解决方案,以利用语言模型本身的内部知识指导生成。我们称为科尼亚肯的方法首先询问语言模式,为特定主题或约束制定指导条款,并使用指南修改模型象征性生成概率。我们提出了三种指导形式(双校、顶级符号、文本例 ), 并采用前定调方法来提炼指南,以解决多种自然语言制约。我们通过广泛的实验性评估,证明科尼亚根可以成功地将常规指令与竞争性基准相统一。