Multiscale Positive-Unlabeled Detection of AI-Generated Texts - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · HTTPS · INFORMS · Performer ·

2023 年 5 月 29 日

Multiscale Positive-Unlabeled Detection of AI-Generated Texts

翻译：暂无翻译

Yuchuan Tian,Hanting Chen,Xutao Wang,Zheyuan Bai,Qinghua Zhang,Ruifeng Li,Chao Xu,Yunhe Wang

Recent releases of Large Language Models (LLMs), e.g. ChatGPT, are astonishing at generating human-like texts, but they may get misused for fake scholarly texts, fake news, fake tweets, et cetera. Previous works have proposed methods to detect these multiscale AI-generated texts, including simple ML classifiers, pretrained-model-based training-agnostic methods, and finetuned language classification models. However, mainstream detectors are formulated without considering the factor of corpus length: shorter corpuses are harder to detect compared with longer ones for shortage of informative features. In this paper, a Multiscale Positive-Unlabeled (MPU) training framework is proposed to address the challenge of multiscale text detection. Firstly, we acknowledge the human-resemblance property of short machine texts, and rephrase text classification as a Positive-Unlabeled (PU) problem by marking these short machine texts as "unlabeled" during training. In this PU context, we propose the length-sensitive Multiscale PU Loss, where we use a recurrent model in abstraction to estimate positive priors of scale-variant corpuses. Additionally, we introduce a Text Multiscaling module to enrich training corpuses. Experiments show that our MPU method augments detection performance on long AI-generated text, and significantly improves short-corpus detection of language model detectors. Language Models trained with MPU could outcompete existing detectors by large margins on multiscale AI-generated texts. The codes are available at https://github.com/mindspore-lab/mindone/tree/master/examples/detect_chatgpt and https://github.com/huawei-noah/Efficient-Computing/AIGC_text_detector.

翻译：暂无翻译

0

相关内容

语言模型化

语言模型化

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

微丝解聚因子ADF1新互作蛋白分子的基因功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

Vaspin在胰岛β细胞炎症、胰岛素抵抗及氧化应激中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于计算机视觉的蓝藻水力剪切效应研究

国家自然科学基金

1+阅读 · 2013年12月31日

雄激素受体在膀胱癌进展中对GATA3的调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

HIV-1 Nef蛋白促进KSHV K1诱导血管和肿瘤形成：信号通路与miRNAs的作用

国家自然科学基金

0+阅读 · 2012年12月31日

DEC1、DEC2对人乳腺癌细胞衰老的调控作用及其作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

维生素E琥珀酸酯诱导胃癌细胞凋亡过程中内质网应激与氧化应激的交互作用

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

紫色土容许土壤流失量主要影响因素的作用机制

国家自然科学基金

0+阅读 · 2009年12月31日

GmMADS1在大豆花发育中的调控机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation

Arxiv

0+阅读 · 2023年7月13日

Improving 2D Human Pose Estimation across Unseen Camera Views with Synthetic Data

Arxiv

0+阅读 · 2023年7月13日

Automated Deception Detection from Videos: Using End-to-End Learning Based High-Level Features and Classification Approaches

Arxiv

0+阅读 · 2023年7月13日

Multiple Testing Framework for Out-of-Distribution Detection

Arxiv

0+阅读 · 2023年7月12日

Towards Large-Scale Small Object Detection: Survey and Benchmarks

Arxiv

40+阅读 · 2022年7月28日

Generalized Out-of-Distribution Detection: A Survey

Generalized Out-of-Distribution Detection: A Survey

Arxiv

15+阅读 · 2021年10月21日

Model-Contrastive Federated Learning

Arxiv

10+阅读 · 2021年3月30日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection

Arxiv

13+阅读 · 2020年12月3日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

数据要素发展报告(2025年)：附下载

人工智能代理提升战时舰船战备水平

【NeurIPS2025教程】大语言模型规划

NeurIPS 2025 教程：深度学习训练不稳定性的理论洞见

相关资讯

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation

Arxiv

0+阅读 · 2023年7月13日

Improving 2D Human Pose Estimation across Unseen Camera Views with Synthetic Data

Arxiv

0+阅读 · 2023年7月13日

Automated Deception Detection from Videos: Using End-to-End Learning Based High-Level Features and Classification Approaches

Arxiv

0+阅读 · 2023年7月13日

Multiple Testing Framework for Out-of-Distribution Detection

Arxiv

0+阅读 · 2023年7月12日

Towards Large-Scale Small Object Detection: Survey and Benchmarks

Arxiv

40+阅读 · 2022年7月28日

Generalized Out-of-Distribution Detection: A Survey

Generalized Out-of-Distribution Detection: A Survey

Arxiv

15+阅读 · 2021年10月21日

Model-Contrastive Federated Learning

Arxiv

10+阅读 · 2021年3月30日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection

Arxiv

13+阅读 · 2020年12月3日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

相关基金

微丝解聚因子ADF1新互作蛋白分子的基因功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

Vaspin在胰岛β细胞炎症、胰岛素抵抗及氧化应激中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于计算机视觉的蓝藻水力剪切效应研究

国家自然科学基金

1+阅读 · 2013年12月31日

雄激素受体在膀胱癌进展中对GATA3的调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

HIV-1 Nef蛋白促进KSHV K1诱导血管和肿瘤形成：信号通路与miRNAs的作用

国家自然科学基金

0+阅读 · 2012年12月31日

DEC1、DEC2对人乳腺癌细胞衰老的调控作用及其作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

维生素E琥珀酸酯诱导胃癌细胞凋亡过程中内质网应激与氧化应激的交互作用

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

紫色土容许土壤流失量主要影响因素的作用机制

国家自然科学基金

0+阅读 · 2009年12月31日

GmMADS1在大豆花发育中的调控机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员