以英语和马拉地语对仇恨、进攻性攻击和光素内容探测的训练前变异器进行英文和马拉地语的微调 (Fine-tuning of Pre-trained Transformers for Hate, Offensive, and Profane Content Detection in English and Marathi)

This paper describes neural models developed for the Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages Shared Task 2021. Our team called neuro-utmn-thales participated in two tasks on binary and fine-grained classification of English tweets that contain hate, offensive, and profane content (English Subtasks A & B) and one task on identification of problematic content in Marathi (Marathi Subtask A). For English subtasks, we investigate the impact of additional corpora for hate speech detection to fine-tune transformer models. We also apply a one-vs-rest approach based on Twitter-RoBERTa to discrimination between hate, profane and offensive posts. Our models ranked third in English Subtask A with the F1-score of 81.99% and ranked second in English Subtask B with the F1-score of 65.77%. For the Marathi tasks, we propose a system based on the Language-Agnostic BERT Sentence Embedding (LaBSE). This model achieved the second result in Marathi Subtask A obtaining an F1 of 88.08%.

翻译：本文介绍了为英语和印地安-亚利安语言中的仇恨言论和攻击性内容识别共同任务2021年研发的神经模型。我们称为神经-丁字塔的团队参与了关于包含仇恨、冒犯和冒犯内容的英文推文二进制和精细分类的两项任务(英文Subtaxk A & B),以及一项在马拉地语中识别有问题内容的任务(Marathi Subtask A)。对于英语子任务,我们调查了在微调变压器模型中增加仇恨言论检测组合对微调变异器的影响。我们还根据Twitter-RobERTA对仇恨、波发和攻击性文章之间的歧视采用了一等反向方法。我们的模型在英文Subtask A中排名第三位,F1分数为81.99%,在英文Subtask B中排名第二,F1分数为65.77%。关于马拉地任务,我们提议在语言-Agnotic BERT判刑嵌入床模式(LABSE)的基础上建立一个系统。我们还采用了基于Twith-RobT-ROBE的一等变换模式,在Marath A中取得了第二个结果。在Marath A获得88-088%的F1的F.08。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日