Large language models (LLMs) have demonstrated high performance on tasks expressed in natural language, particularly in zero- or few-shot settings. These are typically framed as supervised (e.g., classification) or unsupervised (e.g., clustering) problems. However, limited work evaluates LLMs as agents in reinforcement learning (RL) tasks (e.g., playing games), where learning occurs through interaction with an environment and a reward system. While prior work focused on representing tasks that rely on a language representation, we study structured, non-linguistic reasoning - such as interpreting positions in a grid world. We therefore introduce PARL (Prompt-based Agent for Reinforcement Learning), a method that uses LLMs as RL agents through prompting, without any fine-tuning. PARL encodes actions, states, and rewards in the prompt, enabling the model to learn through trial-and-error interaction. We evaluate PARL on three standard RL tasks that do not entirely rely on natural language. We show that it can match or outperform traditional RL agents in simple environments by leveraging pretrained knowledge. However, we identify performance limitations in tasks that require complex mathematical operations or decoding states and actions.
翻译:大型语言模型(LLM)在自然语言表达的任务上表现出优异性能,尤其在零样本或少样本场景中。此类任务通常被构建为监督式(如分类)或无监督式(如聚类)问题。然而,将LLM作为智能体应用于强化学习(RL)任务(例如游戏博弈)的研究仍较为有限,这类任务需要通过与环境及奖励系统的交互进行学习。尽管先前研究主要关注依赖语言表征的任务,本文则聚焦于结构化、非语言化的推理问题——例如解析网格世界中的位置信息。为此,我们提出PARL(基于提示的强化学习智能体),该方法通过提示机制将LLM作为RL智能体使用,无需任何微调。PARL在提示中编码动作、状态与奖励信号,使模型能够通过试错交互进行学习。我们在三个不完全依赖自然语言的标准RL任务上评估PARL。实验表明,在简单环境中,PARL能通过利用预训练知识达到或超越传统RL智能体的性能。然而,我们也发现其在需要复杂数学运算或状态-动作解码的任务中存在性能局限。