Membership Inference Attack (MIA) aims to determine whether a specific data sample was included in the training dataset of a target model. Traditional MIA approaches rely on shadow models to mimic target model behavior, but their effectiveness diminishes for Large Language Model (LLM)-based recommendation systems due to the scale and complexity of training data. This paper introduces a novel knowledge distillation-based MIA paradigm tailored for LLM-based recommendation systems. Our method constructs a reference model via distillation, applying distinct strategies for member and non-member data to enhance discriminative capabilities. The paradigm extracts fused features (e.g., confidence, entropy, loss, and hidden layer vectors) from the reference model to train an attack model, overcoming limitations of individual features. Extensive experiments on extended datasets (Last.FM, MovieLens, Book-Crossing, Delicious) and diverse LLMs (T5, GPT-2, LLaMA3) demonstrate that our approach significantly outperforms shadow model-based MIAs and individual-feature baselines. The results show its practicality for privacy attacks in LLM-driven recommender systems.
翻译:成员推断攻击(MIA)旨在确定特定数据样本是否包含在目标模型的训练数据集中。传统的MIA方法依赖影子模型来模拟目标模型行为,但对于基于大型语言模型(LLM)的推荐系统,由于训练数据的规模和复杂性,这些方法的有效性会降低。本文提出了一种专为基于LLM的推荐系统设计的新型基于知识蒸馏的MIA范式。我们的方法通过蒸馏构建参考模型,对成员和非成员数据应用不同的策略以增强判别能力。该范式从参考模型中提取融合特征(如置信度、熵、损失和隐藏层向量)来训练攻击模型,克服了单一特征的局限性。在扩展数据集(Last.FM、MovieLens、Book-Crossing、Delicious)和多样化LLM(T5、GPT-2、LLaMA3)上的大量实验表明,我们的方法显著优于基于影子模型的MIA和单一特征基线。结果证明了其在LLM驱动的推荐系统中进行隐私攻击的实用性。