Computer-using agents powered by Vision-Language Models (VLMs) have demonstrated human-like capabilities in operating digital environments like mobile platforms. While these agents hold great promise for advancing digital automation, their potential for unsafe operations, such as system compromise and privacy leakage, is raising significant concerns. Detecting these safety concerns across the vast and complex operational space of mobile environments presents a formidable challenge that remains critically underexplored. To establish a foundation for mobile agent safety research, we introduce MobileRisk-Live, a dynamic sandbox environment accompanied by a safety detection benchmark comprising realistic trajectories with fine-grained annotations. Built upon this, we propose OS-Sentinel, a novel hybrid safety detection framework that synergistically combines a Formal Verifier for detecting explicit system-level violations with a VLM-based Contextual Judge for assessing contextual risks and agent actions. Experiments show that OS-Sentinel achieves 10%-30% improvements over existing approaches across multiple metrics. Further analysis provides critical insights that foster the development of safer and more reliable autonomous mobile agents. Our code and data are available at https://github.com/OS-Copilot/OS-Sentinel.
翻译:基于视觉语言模型(VLMs)的计算机使用智能体已在操作移动平台等数字环境中展现出类人能力。尽管这些智能体在推动数字自动化方面前景广阔,但其可能进行的不安全操作(如系统破坏和隐私泄露)正引发重大关切。在移动环境广阔而复杂的操作空间中检测这些安全隐患是一项艰巨挑战,目前仍严重缺乏探索。为奠定移动智能体安全研究的基础,我们引入了MobileRisk-Live——一个动态沙箱环境,并配套包含细粒度标注的真实轨迹的安全检测基准。在此基础上,我们提出了OS-Sentinel,一种新颖的混合安全检测框架,它协同结合了用于检测显式系统级违规的形式化验证器与基于VLM的上下文判断器,以评估上下文风险与智能体行为。实验表明,OS-Sentinel在多项指标上较现有方法提升了10%-30%。进一步的分析提供了关键见解,有助于开发更安全可靠的自主移动智能体。我们的代码与数据可在https://github.com/OS-Copilot/OS-Sentinel获取。