Utilizing powerful Large Language Models (LLMs) for generative recommendation has attracted much attention. Nevertheless, a crucial challenge is transforming recommendation data into the language space of LLMs through effective item tokenization. Current approaches, such as ID, textual, and codebook-based identifiers, exhibit shortcomings in encoding semantic information, incorporating collaborative signals, or handling code assignment bias. To address these limitations, we propose LETTER (a LEarnable Tokenizer for generaTivE Recommendation), which integrates hierarchical semantics, collaborative signals, and code assignment diversity to satisfy the essential requirements of identifiers. LETTER incorporates Residual Quantized VAE for semantic regularization, a contrastive alignment loss for collaborative regularization, and a diversity loss to mitigate code assignment bias. We instantiate LETTER on two models and propose a ranking-guided generation loss to augment their ranking ability theoretically. Experiments on three datasets validate the superiority of LETTER, advancing the state-of-the-art in the field of LLM-based generative recommendation.
翻译:利用强大的大型语言模型(LLM)进行生成式推荐已引起广泛关注。然而,一个关键挑战在于如何通过有效的项目标记化将推荐数据转换到LLM的语言空间中。现有方法,如基于ID、文本或码本的标识符,在编码语义信息、融入协同信号或处理码分配偏差方面存在不足。为解决这些局限,我们提出LETTER(一种用于生成式推荐的可学习标记器),它整合了层次化语义、协同信号和码分配多样性,以满足标识符的基本要求。LETTER采用残差量化变分自编码器进行语义正则化,通过对比对齐损失实现协同正则化,并引入多样性损失以缓解码分配偏差。我们在两个模型上实例化LETTER,并提出一种理论上增强其排序能力的排序引导生成损失。在三个数据集上的实验验证了LETTER的优越性,推动了基于LLM的生成式推荐领域的前沿进展。