We show that large language models (LLMs) can be used to distinguish the writings of different authors. Specifically, an individual GPT-2 model, trained from scratch on the works of one author, will predict held-out text from that author more accurately than held-out text from other authors. We suggest that, in this way, a model trained on one author's works embodies the unique writing style of that author. We first demonstrate our approach on books written by eight different (known) authors. We also use this approach to confirm R. P. Thompson's authorship of the well-studied 15th book of the Oz series, originally attributed to F. L. Baum.
翻译:我们证明,大型语言模型(LLMs)可用于区分不同作者的写作风格。具体而言,针对单一作者的作品从头开始训练的GPT-2模型,在预测该作者预留文本时的准确性,会高于预测其他作者预留文本的准确性。我们认为,通过这种方式,基于某作者作品训练的模型能够体现该作者独特的写作风格。我们首先在八位不同(已知)作者撰写的书籍上验证了该方法。此外,我们还运用该方法确认了R. P. Thompson对《奥兹国》系列第十五部(该作品原被归为F. L. Baum所著)的著作权。