The deployment of Large Language Models (LLMs) in interactive systems necessitates a deep alignment with the nuanced and dynamic preferences of individual users. Current alignment techniques predominantly address universal human values or static, single-turn preferences, thereby failing to address the critical needs of long-term personalization and the initial user cold-start problem. To bridge this gap, we propose PersonalAgent, a novel user-centric lifelong agent designed to continuously infer and adapt to user preferences. PersonalAgent constructs and dynamically refines a unified user profile by decomposing dialogues into single-turn interactions, framing preference inference as a sequential decision-making task. Experiments show that PersonalAgent achieves superior performance over strong prompt-based and policy optimization baselines, not only in idealized but also in noisy conversational contexts, while preserving cross-session preference consistency. Furthermore, human evaluation confirms that PersonalAgent excels at capturing user preferences naturally and coherently. Our findings underscore the importance of lifelong personalization for developing more inclusive and adaptive conversational agents. Our code is available here.
翻译:大型语言模型(LLMs)在交互式系统中的部署,需要与个体用户细致且动态的偏好实现深度对齐。现有的对齐技术主要针对普适的人类价值观或静态的单轮偏好,因此难以满足长期个性化及初始用户冷启动问题的关键需求。为弥补这一差距,我们提出了PersonalAgent——一种新颖的以用户为中心的终身智能体,旨在持续推断并适应用户偏好。PersonalAgent通过将对话分解为单轮交互,将偏好推断构建为序列决策任务,从而构建并动态优化统一的用户画像。实验表明,PersonalAgent在理想化及含噪声的对话语境中,均优于基于提示的强基线及策略优化基线,同时保持了跨会话的偏好一致性。此外,人工评估证实PersonalAgent能够自然且连贯地捕捉用户偏好。我们的研究结果强调了终身个性化对于开发更具包容性和适应性的对话智能体的重要性。代码已公开。