利用干预措施改进文本配对建议系统在分发外的通用化 (Using Interventions to Improve Out-of-Distribution Generalization of Text-Matching Recommendation Systems)

Given a user's input text, text-matching recommender systems output relevant items by comparing the input text to available items' description, such as product-to-product recommendation on e-commerce platforms. As users' interests and item inventory are expected to change, it is important for a text-matching system to generalize to data shifts, a task known as out-of-distribution (OOD) generalization. However, we find that the popular approach of fine-tuning a large, base language model on paired item relevance data (e.g., user clicks) can be counter-productive for OOD generalization. For a product recommendation task, fine-tuning obtains worse accuracy than the base model when recommending items in a new category or for a future time period. To explain this generalization failure, we consider an intervention-based importance metric, which shows that a fine-tuned model captures spurious correlations and fails to learn the causal features that determine the relevance between any two text inputs. Moreover, standard methods for causal regularization do not apply in this setting, because unlike in images, there exist no universally spurious features in a text-matching task (the same token may be spurious or causal depending on the text it is being matched to). For OOD generalization on text inputs, therefore, we highlight a different goal: avoiding high importance scores for certain features. We do so using an intervention-based regularizer that constraints the causal effect of any token on the model's relevance score to be similar to the base model. Results on Amazon product and 3 question recommendation datasets show that our proposed regularizer improves generalization for both in-distribution and OOD evaluation, especially in difficult scenarios when the base model is not accurate.

翻译：鉴于用户的输入文本,文本匹配建议系统通过将输入文本与现有项目描述(如电子商务平台产品对产品的建议)进行比较,从而输出相关项目。随着用户的兴趣和项目库存预期会发生变化,对于文本匹配系统来说,对数据转换的概括化非常重要,这是一个被称为“分配外(OOD)一般化”的任务。然而,我们发现,在配对项目相关性数据(例如,用户点击)上微调一个大基语言模型的流行方法可能会对OOOD一般化产生反效果。对于产品建议任务来说,微调比常规模型更加准确。对于产品建议任务而言,在推荐新类别项目或未来一个时期的项目清单清单时,微调比基准模型更准确。为了解释这种概括化失败,我们考虑基于干预的重要度度衡量标准,它表明微调模型能够捕捉引力的关联性,并且无法了解确定任何基于任何两种文本的输入相关性的因果关系。此外,由于与图像不同,在常规化中不存在任何精确性的相关性,因此,微调的精确度比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值值,因此,因此,因此,因此,因此,在正值在正值比值在正值在正值比值比值比值比值比值比值值值值值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值在