Building general-purpose graphical user interface (GUI) agents has become increasingly promising with the progress in vision language models. However, developing effective mobile GUI agents with reinforcement learning (RL) remains challenging due to the heavy-tailed distribution of task difficulty and the inefficiency of large-scale environment sampling. We present an online agentic reinforcement learning framework MobileRL to enhance GUI agents in mobile environments. Its core component is the Difficulty-ADAptive GRPO (ADAGRPO) algorithm. In ADAGRPO, we design difficulty-adaptive positive replay and failure curriculum filtering to adapt the model to different task difficulties. We introduce the shortest-path reward adjustment strategy to reshape rewards concerning the task length in multi-turn agentic tasks. Those strategies jointly stabilize RL training, improve sample efficiency, and generate strong performance across diverse mobile apps and tasks. We apply MOBILERL to two open models (Qwen2.5-VL-7B-Instruct and GLM-4.1V-9B-Base). The resultant MOBILERL-9B model achieves state-of-the-art results in terms of success rates on both AndroidWorld (80.2%) and AndroidLab (53.6%). The MOBILERL framework is open-sourced at: https://github.com/THUDM/MobileRL.
翻译:随着视觉语言模型的进步,构建通用图形用户界面(GUI)智能体已展现出日益广阔的前景。然而,由于任务难度呈现重尾分布特性以及大规模环境采样效率低下,利用强化学习(RL)开发高效的移动GUI智能体仍面临挑战。本文提出一种在线智能体强化学习框架MobileRL,旨在增强移动环境中的GUI智能体性能。其核心组件是难度自适应GRPO(ADAGRPO)算法。在ADAGRPO中,我们设计了难度自适应正向经验回放与失败课程过滤机制,使模型能够适应不同难度的任务。针对多轮次智能体任务,我们引入了基于任务路径长度的最短路径奖励调整策略,以重塑奖励函数。这些策略共同作用,稳定了强化学习训练过程,提升了样本效率,并在多样化的移动应用与任务中取得了卓越性能。我们将MobileRL应用于两个开源模型(Qwen2.5-VL-7B-Instruct与GLM-4.1V-9B-Base)。由此得到的MobileRL-9B模型在AndroidWorld(80.2%)和AndroidLab(53.6%)的成功率指标上均达到了最先进的水平。MobileRL框架已在以下地址开源:https://github.com/THUDM/MobileRL。