Multiscale Positive-Unlabeled Detection of AI-Generated Texts - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · HTTPS · INFORMS · Performer ·

2023 年 6 月 2 日

Multiscale Positive-Unlabeled Detection of AI-Generated Texts

翻译：暂无翻译

Yuchuan Tian,Hanting Chen,Xutao Wang,Zheyuan Bai,Qinghua Zhang,Ruifeng Li,Chao Xu,Yunhe Wang

Recent releases of Large Language Models (LLMs), e.g. ChatGPT, are astonishing at generating human-like texts, but they may get misused for fake scholarly texts, fake news, fake tweets, et cetera. Previous works have proposed methods to detect these multiscale AI-generated texts, including simple ML classifiers, pretrained-model-based training-agnostic methods, and finetuned language classification models. However, mainstream detectors are formulated without considering the factor of corpus length: shorter corpuses are harder to detect compared with longer ones for shortage of informative features. In this paper, a Multiscale Positive-Unlabeled (MPU) training framework is proposed to address the challenge of multiscale text detection. Firstly, we acknowledge the human-resemblance property of short machine texts, and rephrase text classification as a Positive-Unlabeled (PU) problem by marking these short machine texts as "unlabeled" during training. In this PU context, we propose the length-sensitive Multiscale PU Loss, where we use a recurrent model in abstraction to estimate positive priors of scale-variant corpuses. Additionally, we introduce a Text Multiscaling module to enrich training corpuses. Experiments show that our MPU method augments detection performance on long AI-generated text, and significantly improves short-corpus detection of language model detectors. Language Models trained with MPU could outcompete existing detectors by large margins on multiscale AI-generated texts. The codes are available at https://github.com/mindspore-lab/mindone/tree/master/examples/detect_chatgpt and https://github.com/YuchuanTian/AIGC_text_detector.

翻译：暂无翻译

0

相关内容

语言模型化

语言模型化

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

机器学习研究会

11+阅读 · 2018年1月14日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

Con A修饰型抗菌肽脂质体对单增李斯特菌生物被膜的靶向作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于Dectin-1受体识别的酵母葡聚糖酶解片段的链结构及构效关系的研究

国家自然科学基金

0+阅读 · 2013年12月31日

柽柳Dof转录因子的耐盐调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

油菜无花瓣性状主效QTL qAP8的精细定位与候选基因克隆

国家自然科学基金

0+阅读 · 2012年12月31日

猪背膘厚主效基因捕获及变异序列构建

国家自然科学基金

0+阅读 · 2012年12月31日

VEGFR-1特异性的基因工程化T淋巴细胞的抗肿瘤作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Hippo信号传导通路在肝移植后肝癌复发转移中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

de novo预测蛋白质结构的并行元启发方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

普通小麦3B染色体着丝粒区序列组成及进化分析

国家自然科学基金

0+阅读 · 2009年12月31日

基于功能化的石墨烯基复合材料的生物传感器研究

国家自然科学基金

0+阅读 · 2009年12月31日

Multiscale Video Pretraining for Long-Term Activity Forecasting

Arxiv

0+阅读 · 2023年7月24日

The Imitation Game: Detecting Human and AI-Generated Texts in the Era of Large Language Models

Arxiv

0+阅读 · 2023年7月22日

MythQA: Query-Based Large-Scale Check-Worthy Claim Detection through Multi-Answer Open-Domain Question Answering

Arxiv

0+阅读 · 2023年7月21日

Sabiá: Portuguese Large Language Models

Sabiá: Portuguese Large Language Models

Arxiv

0+阅读 · 2023年7月20日

Performance Issue Identification in Cloud Systems with Relational-Temporal Anomaly Detection

Arxiv

0+阅读 · 2023年7月20日

Scaling Open-Vocabulary Object Detection

Arxiv

0+阅读 · 2023年7月20日

RetouchingFFHQ: A Large-scale Dataset for Fine-grained Face Retouching Detection

Arxiv

0+阅读 · 2023年7月20日

FinGPT: Democratizing Internet-scale Data for Financial Large Language Models

Arxiv

0+阅读 · 2023年7月19日

Generalized Out-of-Distribution Detection: A Survey

Generalized Out-of-Distribution Detection: A Survey

Arxiv

15+阅读 · 2021年10月21日

Prime Sample Attention in Object Detection

Arxiv

13+阅读 · 2019年4月9日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复合人工智能决策优势：面向军事行动的人类数字孪生智能体编队与群体建模》最新文献

中文版《整合蓝绿作战域：北约空陆一体化向多域作战演进》2025最新资料

演进中的空中力量指挥控制体系

《在轨空间目标多智能体检测的制导、导航与控制》195页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

机器学习研究会

11+阅读 · 2018年1月14日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

相关论文

Multiscale Video Pretraining for Long-Term Activity Forecasting

Arxiv

0+阅读 · 2023年7月24日

The Imitation Game: Detecting Human and AI-Generated Texts in the Era of Large Language Models

Arxiv

0+阅读 · 2023年7月22日

MythQA: Query-Based Large-Scale Check-Worthy Claim Detection through Multi-Answer Open-Domain Question Answering

Arxiv

0+阅读 · 2023年7月21日

Sabiá: Portuguese Large Language Models

Sabiá: Portuguese Large Language Models

Arxiv

0+阅读 · 2023年7月20日

Performance Issue Identification in Cloud Systems with Relational-Temporal Anomaly Detection

Arxiv

0+阅读 · 2023年7月20日

Scaling Open-Vocabulary Object Detection

Arxiv

0+阅读 · 2023年7月20日

RetouchingFFHQ: A Large-scale Dataset for Fine-grained Face Retouching Detection

Arxiv

0+阅读 · 2023年7月20日

FinGPT: Democratizing Internet-scale Data for Financial Large Language Models

Arxiv

0+阅读 · 2023年7月19日

Generalized Out-of-Distribution Detection: A Survey

Generalized Out-of-Distribution Detection: A Survey

Arxiv

15+阅读 · 2021年10月21日

Prime Sample Attention in Object Detection

Arxiv

13+阅读 · 2019年4月9日

相关基金

Con A修饰型抗菌肽脂质体对单增李斯特菌生物被膜的靶向作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于Dectin-1受体识别的酵母葡聚糖酶解片段的链结构及构效关系的研究

国家自然科学基金

0+阅读 · 2013年12月31日

柽柳Dof转录因子的耐盐调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

油菜无花瓣性状主效QTL qAP8的精细定位与候选基因克隆

国家自然科学基金

0+阅读 · 2012年12月31日

猪背膘厚主效基因捕获及变异序列构建

国家自然科学基金

0+阅读 · 2012年12月31日

VEGFR-1特异性的基因工程化T淋巴细胞的抗肿瘤作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Hippo信号传导通路在肝移植后肝癌复发转移中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

de novo预测蛋白质结构的并行元启发方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

普通小麦3B染色体着丝粒区序列组成及进化分析

国家自然科学基金

0+阅读 · 2009年12月31日

基于功能化的石墨烯基复合材料的生物传感器研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员