Large Language Models (LLMs) demonstrate significant advantages in leveraging structured world knowledge and multi-step reasoning capabilities. However, fundamental challenges arise when transforming LLMs into real-world recommender systems due to semantic and behavioral misalignment. To bridge this gap, we propose Align$^3$GR, a novel framework that unifies token-level, behavior modeling-level, and preference-level alignment. Our approach introduces: Dual tokenization fusing user-item semantic and collaborative signals. Enhanced behavior modeling with bidirectional semantic alignment. Progressive DPO strategy combining self-play (SP-DPO) and real-world feedback (RF-DPO) for dynamic preference adaptation. Experiments show Align$^3$GR outperforms the SOTA baseline by +17.8% in Recall@10 and +20.2% in NDCG@10 on the public dataset, with significant gains in online A/B tests and full-scale deployment on an industrial large-scale recommendation platform.
翻译:大语言模型在利用结构化世界知识和多步推理能力方面展现出显著优势。然而,将大语言模型转化为实际推荐系统时,由于语义和行为上的错位,存在根本性挑战。为弥合这一差距,我们提出了Align$^3$GR,一个新颖的框架,统一了词元级、行为建模级和偏好级的对齐。我们的方法引入了:融合用户-物品语义信号与协同信号的双重词元化;通过双向语义对齐增强行为建模;以及结合自博弈(SP-DPO)和真实世界反馈(RF-DPO)的动态偏好适应渐进式DPO策略。实验表明,在公开数据集上,Align$^3$GR在Recall@10和NDCG@10指标上分别优于当前最佳基线+17.8%和+20.2%,并在在线A/B测试及工业级大规模推荐平台的全量部署中取得了显著提升。