Hate speech detection research has predominantly focused on purely content-based methods, without exploiting any additional context. We briefly critique pros and cons of this task formulation. We then investigate profiling users by their past utterances as an informative prior to better predict whether new utterances constitute hate speech. To evaluate this, we augment three Twitter hate speech datasets with additional timeline data, then embed this additional context into a strong baseline model. Promising results suggest merit for further investigation, though analysis is complicated by differences in annotation schemes and processes, as well as Twitter API limitations and data sharing policies.
翻译:仇恨言论检测研究主要集中于纯粹基于内容的方法,而没有利用任何其他背景。我们简要地批评了对这一任务拟定的赞成和反对意见。然后,我们用过去的说法来调查貌相用户,作为信息,更好地预测新言论是否构成仇恨言论。为了评估这一点,我们增加了三个推特仇恨言论数据集,并增加了其他时间表数据,然后将这一新增背景纳入一个强有力的基线模型。 前景看好的结果表明值得进一步调查,尽管由于批注计划和流程的不同以及Twitter API的局限性和数据共享政策,分析变得复杂。