Detecting social bias in text is challenging due to nuance, subjectivity, and difficulty in obtaining good quality labeled datasets at scale, especially given the evolving nature of social biases and society. To address these challenges, we propose a few-shot instruction-based method for prompting pre-trained language models (LMs). We select a few label-balanced exemplars from a small support repository that are closest to the query to be labeled in the embedding space. We then provide the LM with instruction that consists of this subset of labeled exemplars, the query text to be classified, a definition of bias, and prompt it to make a decision. We demonstrate that large LMs used in a few-shot context can detect different types of fine-grained biases with similar and sometimes superior accuracy to fine-tuned models. We observe that the largest 530B parameter model is significantly more effective in detecting social bias compared to smaller models (achieving at least 20% improvement in AUC metric compared to other models). It also maintains a high AUC (dropping less than 5%) in a few-shot setting with a labeled repository reduced to as few as 100 samples. Large pretrained language models thus make it easier and quicker to build new bias detectors.
翻译:检测文字中的社会偏差具有挑战性,因为细微、主观和难以获得质量良好的标签数据集,特别是考虑到社会偏见和社会的演变性质。为了应对这些挑战,我们建议了一种微小的基于指导的办法来促进预培训语言模式(LMs)。我们从一个与嵌入空间中贴上标签的查询最接近的小支持存储库中选择几个标签平衡的模版。我们然后向LM提供由这个标签的模版组组成的教学,查询文本有待分类,偏见定义,并促使它作出决定。我们证明,在几发环境中使用的大型LMs可以探测出不同类型的精细的偏差,与微调模型相似,有时甚至更精确。我们观察到,最大的530B参数模型比较小的模型在发现社会偏差方面要有效得多(与其他模型相比,AUC衡量标准至少改进20% ) 。它还保持高的AUC(略低于5% ) 在几发图像前的模型中保持高的AUCSUC(降低5% ), 并让它更精确地定位为100 。