基于大语言模型的学术评审中的多语言隐藏提示注入攻击 (Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing) - 专知论文

会员服务 ·

0

提示注入 · 攻击 · 提示注入攻击 · 语言模型 · 论文 ·

Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing

翻译：基于大语言模型的学术评审中的多语言隐藏提示注入攻击

Panagiotis Theocharopoulos,Ajinkya Kulkarni,Mathew Magimai. -Doss

Large language models (LLMs) are increasingly considered for use in high-impact workflows, including academic peer review. However, LLMs are vulnerable to document-level hidden prompt injection attacks. In this work, we construct a dataset of approximately 500 real academic papers accepted to ICML and evaluate the effect of embedding hidden adversarial prompts within these documents. Each paper is injected with semantically equivalent instructions in four different languages and reviewed using an LLM. We find that prompt injection induces substantial changes in review scores and accept/reject decisions for English, Japanese, and Chinese injections, while Arabic injections produce little to no effect. These results highlight the susceptibility of LLM-based reviewing systems to document-level prompt injection and reveal notable differences in vulnerability across languages.

翻译：大语言模型（LLMs）正越来越多地被考虑应用于高影响力工作流，包括学术同行评审。然而，LLMs容易受到文档级隐藏提示注入攻击。在本工作中，我们构建了一个包含约500篇被ICML接收的真实学术论文的数据集，并评估了在这些文档中嵌入隐藏对抗性提示的效果。每篇论文均被注入了四种不同语言但语义相同的指令，并使用LLM进行评审。我们发现，对于英语、日语和中文的注入，提示注入会导致评审分数和接收/拒绝决定发生显著变化，而阿拉伯语注入则几乎不产生任何影响。这些结果凸显了基于LLM的评审系统对文档级提示注入的易感性，并揭示了不同语言间脆弱性的显著差异。

0

相关内容

提示注入

【ICMR2020】持续健康状态接口事件检索

【ICMR2020】持续健康状态接口事件检索

专知会员服务

18+阅读 · 2020年4月18日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

【ACL2020-Facebook AI】大规模无监督跨语言表示学习

【ACL2020-Facebook AI】大规模无监督跨语言表示学习

专知会员服务

34+阅读 · 2020年4月5日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

[CVPR 2021] 序列到序列对比学习的文本识别

[CVPR 2021] 序列到序列对比学习的文本识别

专知

10+阅读 · 2021年4月14日

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

专知

37+阅读 · 2020年6月11日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知

16+阅读 · 2020年5月31日

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

专知

10+阅读 · 2020年3月31日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

社交网络中的流言传播与演化

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

复杂多元数据的半参数统计推断

国家自然科学基金

5+阅读 · 2014年12月31日

基于免疫的Rootkit隐遁攻击动态内存取证方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

Isolating Compiler Faults via Multiple Pairs of Adversarial Compilation Configurations

Arxiv

0+阅读 · 12月27日

Evaluating Large Language Models for Line-Level Vulnerability Localization

Arxiv

0+阅读 · 12月26日

Poisson-Process Topic Model for Integrating Knowledge from Pre-trained Language Models

Arxiv

0+阅读 · 12月26日

MultiMind at SemEval-2025 Task 7: Crosslingual Fact-Checked Claim Retrieval via Multi-Source Alignment

Arxiv

0+阅读 · 12月24日

Automated Red-Teaming Framework for Large Language Model Security Assessment: A Comprehensive Attack Generation and Detection System

Arxiv

0+阅读 · 12月21日

VIP会员

文章信息

相关主题

提示注入攻击

相关VIP内容

【ICMR2020】持续健康状态接口事件检索

【ICMR2020】持续健康状态接口事件检索

专知会员服务

18+阅读 · 2020年4月18日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

【ACL2020-Facebook AI】大规模无监督跨语言表示学习

【ACL2020-Facebook AI】大规模无监督跨语言表示学习

专知会员服务

34+阅读 · 2020年4月5日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

热门VIP内容

开通专知VIP会员享更多权益服务

智能体工程（Agent Engineering）

《全球地缘政治环境中的反无人机系统互操作性》252页

专业软件开发者不靠“氛围编程”（Vibe Coding），而靠“控制”：2025 年 AI Agent 在编程中的应用研究

基于大语言模型的智能体化软件问题解决：综述

相关资讯

[CVPR 2021] 序列到序列对比学习的文本识别

[CVPR 2021] 序列到序列对比学习的文本识别

专知

10+阅读 · 2021年4月14日

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

专知

37+阅读 · 2020年6月11日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知

16+阅读 · 2020年5月31日

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

专知

10+阅读 · 2020年3月31日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

相关论文

Isolating Compiler Faults via Multiple Pairs of Adversarial Compilation Configurations

Arxiv

0+阅读 · 12月27日

Evaluating Large Language Models for Line-Level Vulnerability Localization

Arxiv

0+阅读 · 12月26日

Poisson-Process Topic Model for Integrating Knowledge from Pre-trained Language Models

Arxiv

0+阅读 · 12月26日

MultiMind at SemEval-2025 Task 7: Crosslingual Fact-Checked Claim Retrieval via Multi-Source Alignment

Arxiv

0+阅读 · 12月24日

Automated Red-Teaming Framework for Large Language Model Security Assessment: A Comprehensive Attack Generation and Detection System

Arxiv

0+阅读 · 12月21日

相关基金

社交网络中的流言传播与演化

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

复杂多元数据的半参数统计推断

国家自然科学基金

5+阅读 · 2014年12月31日

基于免疫的Rootkit隐遁攻击动态内存取证方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员