针对已部署移动视觉语言代理的实用且隐蔽的触控引导越狱攻击 (Practical and Stealthy Touch-Guided Jailbreak Attacks on Deployed Mobile Vision-Language Agents)

Large vision-language models (LVLMs) enable autonomous mobile agents to operate smartphone user interfaces, yet vulnerabilities in their perception and interaction remain critically understudied. Existing research often relies on conspicuous overlays, elevated permissions, or unrealistic threat assumptions, limiting stealth and real-world feasibility. In this paper, we introduce a practical and stealthy jailbreak attack framework, which comprises three key components: (i) non-privileged perception compromise, which injects visual payloads into the application interface without requiring elevated system permissions; (ii) agent-attributable activation, which leverages input attribution signals to distinguish agent from human interactions and limits prompt exposure to transient intervals to preserve stealth from end users; and (iii) efficient one-shot jailbreak, a heuristic iterative deepening search algorithm (HG-IDA*) that performs keyword-level detoxification to bypass built-in safety alignment of LVLMs. Moreover, we developed three representative Android applications and curated a prompt-injection dataset for mobile agents. We evaluated our attack across multiple LVLM backends, including closed-source services and representative open-source models, and observed high planning and execution hijack rates (e.g., GPT-4o: 82.5% planning / 75.0% execution), exposing a fundamental security vulnerability in current mobile agents and underscoring critical implications for autonomous smartphone operation.

翻译：大型视觉语言模型（LVLMs）使得自主移动代理能够操作智能手机用户界面，然而其感知与交互中的漏洞仍严重缺乏研究。现有研究通常依赖显眼的覆盖层、提升的权限或不现实的威胁假设，限制了隐蔽性和现实可行性。本文提出了一种实用且隐蔽的越狱攻击框架，包含三个关键组成部分：（i）非特权感知妥协，即在无需提升系统权限的情况下将视觉载荷注入应用界面；（ii）代理可归因激活，利用输入归因信号区分代理与人类交互，并将提示暴露限制在短暂区间内以保持对终端用户的隐蔽性；（iii）高效单次越狱，一种启发式迭代深化搜索算法（HG-IDA*），通过关键词级去毒化绕过LVLMs内置的安全对齐机制。此外，我们开发了三款具有代表性的Android应用程序，并构建了针对移动代理的提示注入数据集。我们在多个LVLM后端（包括闭源服务和代表性开源模型）上评估了攻击效果，观察到较高的规划与执行劫持率（例如GPT-4o：规划82.5%/执行75.0%），这揭示了当前移动代理的基础安全漏洞，并凸显了自主智能手机操作的关键安全隐患。