Computerized Adaptive Testing (CAT) is a widely used technology for evaluating learners' proficiency in online education platforms. By leveraging prior estimates of proficiency to select questions and updating the estimates iteratively based on responses, CAT enables personalized learner modeling and has attracted substantial attention. Despite this progress, most existing works focus primarily on improving diagnostic accuracy, while overlooking the selection bias inherent in the adaptive process. Selection Bias arises because the question selection is strongly influenced by the estimated proficiency, such as assigning easier questions to learners with lower proficiency and harder ones to learners with higher proficiency. Since the selection depends on prior estimation, this bias propagates into the diagnosis model, which is further amplified during iterative updates, leading to misalignment and biased predictions. Moreover, the imbalanced nature of learners' historical interactions often exacerbates the bias in diagnosis models. To address this issue, we propose a debiasing framework consisting of two key modules: Cross-Attribute Examinee Retrieval and Selective Mixup-based Regularization. First, we retrieve balanced examinees with relatively even distributions of correct and incorrect responses and use them as neutral references for biased examinees. Then, mixup is applied between each biased examinee and its matched balanced counterpart under label consistency. This augmentation enriches the diversity of bias-conflicting samples and smooths selection boundaries. Finally, extensive experiments on two benchmark datasets with multiple advanced diagnosis models demonstrate that our method substantially improves both the generalization ability and fairness of question selection in CAT.
翻译:计算机化自适应测试(CAT)是一种广泛应用于在线教育平台中评估学习者能力水平的技术。通过利用先前的熟练度估计来选择问题,并根据回答迭代更新估计值,CAT实现了个性化的学习者建模,并引起了广泛关注。尽管取得了这些进展,现有研究大多主要关注提高诊断准确性,而忽视了自适应过程中固有的选择偏差。选择偏差的产生是因为问题选择受到熟练度估计的强烈影响,例如向熟练度较低的学习者分配较简单的问题,而向熟练度较高的学习者分配较难的问题。由于选择依赖于先前的估计,这种偏差会传播到诊断模型中,并在迭代更新过程中进一步放大,导致错位和有偏预测。此外,学习者历史交互的不平衡性往往加剧了诊断模型中的偏差。为解决这一问题,我们提出了一个去偏框架,包含两个关键模块:跨属性考生检索和基于选择性混合的正则化。首先,我们检索具有相对均匀的正确和错误回答分布的平衡考生,并将其作为有偏考生的中性参考。然后,在标签一致性的前提下,在每个有偏考生与其匹配的平衡对应样本之间应用混合。这种数据增强丰富了偏差冲突样本的多样性,并平滑了选择边界。最后,在两个基准数据集上使用多种先进诊断模型进行的广泛实验表明,我们的方法显著提高了CAT中问题选择的泛化能力和公平性。