As a result of social network popularity, in recent years, hate speech phenomenon has significantly increased. Due to its harmful effect on minority groups as well as on large communities, there is a pressing need for hate speech detection and filtering. However, automatic approaches shall not jeopardize free speech, so they shall accompany their decisions with explanations and assessment of uncertainty. Thus, there is a need for predictive machine learning models that not only detect hate speech but also help users understand when texts cross the line and become unacceptable. The reliability of predictions is usually not addressed in text classification. We fill this gap by proposing the adaptation of deep neural networks that can efficiently estimate prediction uncertainty. To reliably detect hate speech, we use Monte Carlo dropout regularization, which mimics Bayesian inference within neural networks. We evaluate our approach using different text embedding methods. We visualize the reliability of results with a novel technique that aids in understanding the classification reliability and errors.
翻译:近几年来,由于社会网络的普及,仇恨言论现象显著增加。由于这种现象对少数群体和大型社区造成有害影响,迫切需要发现和过滤仇恨言论。然而,自动办法不应危及言论自由,因此,在作出决定时,应同时解释和评估不确定性。因此,需要预测性机器学习模式,不仅能发现仇恨言论,而且能帮助用户在文本跨线和变得不可接受时了解。预测的可靠性通常在文本分类中不予处理。我们建议调整深层神经网络,以便有效地估计预测不确定性,以填补这一空白。为了可靠地探测仇恨言论,我们使用蒙特卡洛辍学规范,在神经网络中模仿贝耶斯的推断。我们使用不同的文本嵌入方法评估我们的方法。我们用一种有助于理解分类可靠性和错误的新技术来观察结果的可靠性。