The rise of Large Language Models (LLMs) necessitates accurate AI-generated text detection. However, current approaches largely overlook the influence of author characteristics. We investigate how sociolinguistic attributes-gender, CEFR proficiency, academic field, and language environment-impact state-of-the-art AI text detectors. Using the ICNALE corpus of human-authored texts and parallel AI-generated texts from diverse LLMs, we conduct a rigorous evaluation employing multi-factor ANOVA and weighted least squares (WLS). Our results reveal significant biases: CEFR proficiency and language environment consistently affected detector accuracy, while gender and academic field showed detector-dependent effects. These findings highlight the crucial need for socially aware AI text detection to avoid unfairly penalizing specific demographic groups. We offer novel empirical evidence, a robust statistical framework, and actionable insights for developing more equitable and reliable detection systems in real-world, out-of-domain contexts. This work paves the way for future research on bias mitigation, inclusive evaluation benchmarks, and socially responsible LLM detectors.
翻译:大型语言模型(LLMs)的兴起使得准确检测AI生成文本变得至关重要。然而,现有方法大多忽视了作者特征的影响。本研究探讨了社会语言学属性——性别、CEFR语言能力、学术领域和语言环境——如何影响最先进的AI文本检测器。通过使用ICNALE语料库中的人工撰写文本以及来自不同LLMs的并行AI生成文本,我们采用多因素方差分析和加权最小二乘法(WLS)进行了严格评估。结果表明存在显著偏差:CEFR语言能力和语言环境持续影响检测器准确率,而性别和学术领域的影响则因检测器而异。这些发现强调了开发具有社会意识的AI文本检测系统的迫切需求,以避免对特定人口群体造成不公正的惩罚。我们提供了新颖的经验证据、稳健的统计框架以及可行的见解,以促进在现实世界、跨领域情境中开发更公平、更可靠的检测系统。本研究为未来在偏差缓解、包容性评估基准和社会责任型LLM检测器方面的研究铺平了道路。