ChatGPT(-3.5,-4)生成和人写论文的鉴别: 基于日语文体学的分析 (Distinguishing ChatGPT(-3.5, -4)-generated and human-written papers through Japanese stylometric analysis)

Text-generative artificial intelligence (AI), including ChatGPT, equipped with GPT-3.5 and GPT-4, from OpenAI, has attracted considerable attention worldwide. In this study, first, we compared Japanese stylometric features generated by GPT (-3.5 and -4) and those written by humans. In this work, we performed multi-dimensional scaling (MDS) to confirm the distributions of 216 texts of three classes (72 academic papers written by 36 single authors, 72 texts generated by GPT-3.5, and 72 texts generated by GPT-4 on the basis of the titles of the aforementioned papers) focusing on the following stylometric features: (1) bigrams of parts-of-speech, (2) bigram of postpositional particle words, (3) positioning of commas, and (4) rate of function words. MDS revealed distinct distributions at each stylometric feature of GPT (-3.5 and -4) and human. Although GPT-4 is more powerful than GPT-3.5 because it has more parameters, both GPT (-3.5 and -4) distributions are likely to overlap. These results indicate that although the number of parameters may increase in the future, AI-generated texts may not be close to that written by humans in terms of stylometric features. Second, we verified the classification performance of random forest (RF) for two classes (GPT and human) focusing on Japanese stylometric features. This study revealed the high performance of RF in each stylometric feature. Furthermore, the RF classifier focusing on the rate of function words achieved 98.1% accuracy. The RF classifier focusing on all stylometric features reached 100% in terms of all performance indexes (accuracy, recall, precision, and F1 score). This study concluded that at this stage we human discriminate ChatGPT from human limited to Japanese language.

翻译：文本生成的人工智能(ChatGPT)，包括OpenAI的GPT-3.5和GPT-4，已经在全球范围内引起了极大的关注。本研究首先比较了由GPT(-3.5和-4)生成的日语文体特征和人类写作的文体特征。本研究采用多维缩放(MDS)分析来确认三类216篇文本(36位单一作者撰写的72篇学术论文、基于前述论文标题生成的72篇GPT-3.5文本和72篇GPT-4文本)的分布情况，侧重以下文体特征: (1) 词性二元组，(2) 助词二元组，(3) 逗号位置，和 (4) 功能词比率。MDS 揭示了 GPT(-3.5和-4)生成文本和人类写作文本在每个文体特征上都有明显不同的分布。虽然 GPT-4 比 GPT-3.5 功率更大，因为它具有更多的参数，但是 GPT(-3.5和-4) 的分布可能会发生重叠。这些结果表明虽然未来参数数量可能会增加，但是在文体特征方面，人工智能生成的文本可能与人类撰写的文本不会接近。其次，我们通过随机森林(RF)分类器验证了以文体特征为中心的两类(GPT和人类)分类性能，本研究揭示了每种文体特征的高性能RF分类器。此外，以功能词比率为中心的RF分类器达到98.1%的准确率，以全部文体特征为中心的RF分类器在所有性能指标(准确率、召回率、精确率和 F1 分数)上达到了100%。本研究的结论是，在目前这个阶段，以日语为例，我们人类可以区分 ChatGPT 和人类撰写的文本。

相关内容

分类器

关注 6

分类是数据挖掘的一种非常重要的方法。分类的概念是在已有数据的基础上学会一个分类函数或构造出一个分类模型（即我们通常所说的分类器(Classifier)）。该函数或模型能够把数据库中的数据纪录映射到给定类别中的某一个，从而可以应用于数据预测。总之，分类器是数据挖掘中对样本进行分类的方法的统称，包含决策树、逻辑回归、朴素贝叶斯、神经网络等算法。

揭秘ChatGPT情感对话能力

专知会员服务

56+阅读 · 2023年4月9日

【百度&北京大学】自然语言生成的保真性:分析、评价和优化方法的系统综述，Faithfulness in Natural Language Generation: A Systematic Survey of Analysis, Evaluation and Optimization Methods

专知会员服务

14+阅读 · 2022年3月11日

【英国萨里大学】神经文本生成的研究进展:任务无关的综述，Recent Advances in Neural Text Generation: A Task-Agnostic Survey

专知会员服务

17+阅读 · 2022年3月8日

【PAISS 2021 教程】概率散度与生成式模型，92页ppt

专知会员服务

32+阅读 · 2021年11月30日