Controlled generation of text is of high practical use. Recent efforts have made impressive progress in generating or editing sentences with given textual attributes (e.g., sentiment). This work studies a new practical setting of text content manipulation. Given a structured record, such as `(PLAYER: Lebron, POINTS: 20, ASSISTS: 10)', and a reference sentence, such as `Kobe easily dropped 30 points', we aim to generate a sentence that accurately describes the full content in the record, with the same writing style (e.g., wording, transitions) of the reference. The problem is unsupervised due to lack of parallel data in practice, and is challenging to minimally yet effectively manipulate the text (by rewriting/adding/deleting text portions) to ensure fidelity to the structured content. We derive a dataset from a basketball game report corpus as our testbed, and develop a neural method with unsupervised competing objectives and explicit content coverage constraints. Automatic and human evaluations show superiority of our approach over competitive methods including a strong rule-based baseline and prior approaches designed for style transfer.
翻译:受控制的文本生成具有很高的实际用途。最近的努力在生成或编辑带有特定文字属性(例如情绪)的句子方面取得了令人印象深刻的进展。这项工作研究了文本内容操纵的一个新的实际设置。鉴于有结构化的记录,例如“(PLAYER:Lebron, PONINTS:20, ASISTS:10)”和一个参考句子,例如“Kobe很容易地降低30点”,我们的目标是生成一个句子,准确描述记录的全部内容,其文字风格(例如措辞、过渡)相同。由于实践中缺乏平行数据,这个问题没有受到监督,而且很难尽可能有效地操纵文本(通过重新撰写/添加/删除文本部分),以确保对结构化内容的正确性。我们从篮球游戏报告文集中得出数据集,作为我们的测试台,并开发一种神经方法,其目标不那么精确,内容覆盖也存在明确的限制。自动和人文评价显示我们的方法优于竞争性方法,包括强有力的基于规则的基线和先前设计用于风格转让的方法。