机器翻译结果如下（仅供参考）：稳健的分类器在低资源和不平衡数据集中识别教育对话行为 (Robust Educational Dialogue Act Classifiers with Low-Resource and Imbalanced Datasets)

from arxiv, 12 pages full paper, The 24th International Conference on Artificial Intelligence in Education, AIED 2023 Educational Dialogue Act Classification, Model Robustness, Low-Resource Data, Imbalanced Data, Large Language Models

Dialogue acts (DAs) can represent conversational actions of tutors or students that take place during tutoring dialogues. Automating the identification of DAs in tutoring dialogues is significant to the design of dialogue-based intelligent tutoring systems. Many prior studies employ machine learning models to classify DAs in tutoring dialogues and invest much effort to optimize the classification accuracy by using limited amounts of training data (i.e., low-resource data scenario). However, beyond the classification accuracy, the robustness of the classifier is also important, which can reflect the capability of the classifier on learning the patterns from different class distributions. We note that many prior studies on classifying educational DAs employ cross entropy (CE) loss to optimize DA classifiers on low-resource data with imbalanced DA distribution. The DA classifiers in these studies tend to prioritize accuracy on the majority class at the expense of the minority class which might not be robust to the data with imbalanced ratios of different DA classes. To optimize the robustness of classifiers on imbalanced class distributions, we propose to optimize the performance of the DA classifier by maximizing the area under the ROC curve (AUC) score (i.e., AUC maximization). Through extensive experiments, our study provides evidence that (i) by maximizing AUC in the training process, the DA classifier achieves significant performance improvement compared to the CE approach under low-resource data, and (ii) AUC maximization approaches can improve the robustness of the DA classifier under different class imbalance ratios.

翻译：对话行为（DAs）可以代表在辅导对话期间发生的导师或学生的对话行动。自动识别导学对话中的DAs对于基于对话的智能辅导系统的设计非常重要。许多之前的研究采用机器学习模型对导学DAs进行分类，并投入大量精力来使用有限数量的训练数据（即低资源数据场景）来优化分类准确率。但是，除了分类准确性之外，分类器的稳健性也非常重要，它可以反映分类器从不同类分布中学习模式的能力。我们注意到，许多之前用于分类教育DAs的研究采用交叉熵（CE）损失来优化DA分类器，其在具有不平衡DA分布的低资源数据中倾向于优先考虑大多数类的准确性，而不考虑少数类，这可能对具有不平衡不同DA类比率的数据不够稳健。为了优化不平衡类分布下分类器的稳健性，我们建议通过最大化ROC曲线下面积（AUC）得分（即最大化AUC）来优化DA分类器的性能。通过广泛的实验，我们的研究提供了证据，即（i）通过最大化训练过程中的AUC，DA分类器在低资源数据下实现了显著的性能改进，并（ii）AUC最大化方法可以提高不同类不平衡比率下的DA分类器的稳健性。

相关内容

分类器

关注 6

分类是数据挖掘的一种非常重要的方法。分类的概念是在已有数据的基础上学会一个分类函数或构造出一个分类模型（即我们通常所说的分类器(Classifier)）。该函数或模型能够把数据库中的数据纪录映射到给定类别中的某一个，从而可以应用于数据预测。总之，分类器是数据挖掘中对样本进行分类的方法的统称，包含决策树、逻辑回归、朴素贝叶斯、神经网络等算法。

【深度迁移学习在图像分类中的应用综述】Deep transfer learning for image classification: a survey

专知会员服务

25+阅读 · 2022年5月24日

【Nature Machine Intelligence】机器学习模型能否克服有偏置的数据集？哈佛、MIT专家为你解读

专知会员服务

31+阅读 · 2022年3月11日

WWW21最新「比较学习」教程，135页PPT阐述从排名数据中学习

专知会员服务

37+阅读 · 2021年4月27日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日