While CodeMem establishes executable code as the optimal representation for agentic procedural memory, the mechanism for autonomously synthesizing this memory from a blank slate remains underexplored. This paper operationalizes the transition of Large Language Models from passive tool-users to active workflow architects. Through a high-fidelity case study of a cross-service orchestration task involving Outlook and OneDrive, we identify and address four structural bottlenecks in automated skill generation: the Discovery Gap involving navigation of large tool registries, the Verification Gap regarding grounding tool response structures, the Decomposition Gap which replaces inefficient search with Linear State Anchoring, and the Scaling Gap focused on concurrency and persistence. We demonstrate that by enforcing a scientific methodology of hypothesize, probe, and code, agents can autonomously write robust, production-grade code skills.
翻译:尽管CodeMem确立了可执行代码作为智能体程序化记忆的最佳表示形式,但从零开始自主合成这种记忆的机制仍未得到充分探索。本文实现了大型语言模型从被动工具使用者向主动工作流架构师的转变。通过对涉及Outlook和OneDrive的跨服务编排任务进行高保真案例研究,我们识别并解决了自动化技能生成中的四个结构性瓶颈:涉及大型工具库导航的发现鸿沟、关于工具响应结构接地的验证鸿沟、用线性状态锚定替代低效搜索的分解鸿沟,以及专注于并发性和持久性的扩展鸿沟。我们证明,通过强制执行假设、探测和编码的科学方法论,智能体能够自主编写健壮的生产级代码技能。