While existing social bot detectors perform well on benchmarks, their robustness across diverse real-world scenarios remains limited due to unclear ground truth and varied misleading cues. In particular, the impact of shortcut learning, where models rely on spurious correlations instead of capturing causal task-relevant features, has received limited attention. To address this gap, we conduct an in-depth study to assess how detectors are influenced by potential shortcuts based on textual features, which are most susceptible to manipulation by social bots. We design a series of shortcut scenarios by constructing spurious associations between user labels and superficial textual cues to evaluate model robustness. Results show that shifts in irrelevant feature distributions significantly degrade social bot detector performance, with an average relative accuracy drop of 32\% in the baseline models. To tackle this challenge, we propose mitigation strategies based on large language models, leveraging counterfactual data augmentation. These methods mitigate the problem from data and model perspectives across three levels, including data distribution at both the individual user text and overall dataset levels, as well as the model's ability to extract causal information. Our strategies achieve an average relative performance improvement of 56\% under shortcut scenarios.
翻译:尽管现有社交机器人检测器在基准测试中表现良好,但由于真实标注不明确及误导性线索多样,其在多样化现实场景中的鲁棒性仍显不足。特别是捷径学习的影响——即模型依赖虚假相关性而非捕捉因果性任务相关特征——尚未得到充分关注。为填补这一空白,我们开展了一项深入研究,评估检测器如何受基于文本特征的潜在捷径影响,这些特征最易受社交机器人操纵。我们通过构建用户标签与表层文本线索间的虚假关联,设计了一系列捷径场景以评估模型鲁棒性。结果显示,无关特征分布的偏移显著降低了社交机器人检测器的性能,基线模型的平均相对准确率下降了32%。为应对这一挑战,我们提出了基于大型语言模型的缓解策略,利用反事实数据增强技术。这些方法从数据和模型两个角度在三个层面解决问题,包括个体用户文本及整体数据集层面的数据分布,以及模型提取因果信息的能力。我们的策略在捷径场景下实现了平均56%的相对性能提升。