Designing state encoders for reinforcement learning (RL) with multiple information sources -- such as sensor measurements, time-series signals, image observations, and textual instructions -- remains underexplored and often requires manual design. We formalize this challenge as a problem of composite neural architecture search (NAS), where multiple source-specific modules and a fusion module are jointly optimized. Existing NAS methods overlook useful side information from the intermediate outputs of these modules -- such as their representation quality -- limiting sample efficiency in multi-source RL settings. To address this, we propose an LLM-driven NAS pipeline in which the LLM serves as a neural architecture design agent, leveraging language-model priors and intermediate-output signals to guide sample-efficient search for high-performing composite state encoders. On a mixed-autonomy traffic control task, our approach discovers higher-performing architectures with fewer candidate evaluations than traditional NAS baselines and the LLM-based GENIUS framework.
翻译:针对多信息源(如传感器测量值、时间序列信号、图像观测和文本指令)的强化学习状态编码器设计仍缺乏深入探索,且通常依赖人工设计。我们将此挑战形式化为复合神经架构搜索问题,其中多个源特定模块与融合模块被联合优化。现有神经架构搜索方法忽略了这些模块中间输出的有用辅助信息(如其表示质量),限制了多源强化学习场景下的样本效率。为解决此问题,我们提出一种基于大语言模型的神经架构搜索流程,其中大语言模型作为神经架构设计代理,利用语言模型先验和中间输出信号来指导样本高效搜索高性能复合状态编码器。在混合自主交通控制任务中,相较于传统神经架构搜索基线及基于大语言模型的GENIUS框架,我们的方法以更少的候选架构评估次数发现了性能更优的架构。