With the advancements in hardware, software, and large language model technologies, the interaction between humans and operating systems has evolved from the command-line interface to the rapidly emerging AI agent interactions. Building an operating system (OS) agent capable of executing user instructions and faithfully following user desires is becoming a reality. In this technical report, we present ColorAgent, an OS agent designed to engage in long-horizon, robust interactions with the environment while also enabling personalized and proactive user interaction. To enable long-horizon interactions with the environment, we enhance the model's capabilities through step-wise reinforcement learning and self-evolving training, while also developing a tailored multi-agent framework that ensures generality, consistency, and robustness. In terms of user interaction, we explore personalized user intent recognition and proactive engagement, positioning the OS agent not merely as an automation tool but as a warm, collaborative partner. We evaluate ColorAgent on the AndroidWorld and AndroidLab benchmarks, achieving success rates of 77.2% and 50.7%, respectively, establishing a new state of the art. Nonetheless, we note that current benchmarks are insufficient for a comprehensive evaluation of OS agents and propose further exploring directions in future work, particularly in the areas of evaluation paradigms, agent collaboration, and security. Our code is available at https://github.com/MadeAgents/mobile-use.
翻译:随着硬件、软件以及大语言模型技术的进步,人类与操作系统的交互已从命令行界面演进至快速兴起的AI智能体交互。构建一个能够执行用户指令并忠实遵循用户意愿的操作系统(OS)智能体正逐渐成为现实。在本技术报告中,我们介绍了ColorAgent,这是一个旨在与环境进行长周期、鲁棒交互,同时支持个性化与主动式用户交互的操作系统智能体。为实现与环境的长期交互,我们通过分步强化学习与自演化训练来增强模型能力,并开发了一个定制的多智能体框架,以确保通用性、一致性与鲁棒性。在用户交互方面,我们探索了个性化用户意图识别与主动参与机制,将操作系统智能体定位为不仅是自动化工具,更是一个温暖、协作的伙伴。我们在AndroidWorld和AndroidLab基准测试上对ColorAgent进行了评估,分别取得了77.2%和50.7%的成功率,创造了新的最佳性能。然而,我们注意到现有基准测试尚不足以全面评估操作系统智能体,并建议在未来工作中进一步探索评估范式、智能体协作与安全等方向。我们的代码发布于https://github.com/MadeAgents/mobile-use。