Doxing refers to the practice of disclosing sensitive personal information about a person without their consent. This form of cyberbullying is an unpleasant and sometimes dangerous phenomenon for online social networks. Although prior work exists on automated identification of other types of cyberbullying, a need exists for methods capable of detecting doxing on Twitter specifically. We propose and evaluate a set of approaches for automatically detecting second- and third-party disclosures on Twitter of sensitive private information, a subset of which constitutes doxing. We summarize our findings of common intentions behind doxing episodes and compare nine different approaches for automated detection based on string-matching and one-hot encoded heuristics, as well as word and contextualized string embedding representations of tweets. We identify an approach providing 96.86% accuracy and 97.37% recall using contextualized string embeddings and conclude by discussing the practicality of our proposed methods.
翻译:我们建议并评价一套方法,用于自动检测在Twitter上披露的第二和第三方敏感私人信息,其中一部分是毒理学。我们总结了在毒理学事件背后的共同意图,比较了九种基于字符串匹配和单热编码黑文学的自动检测方法,以及推特的字词和背景化字符串嵌图案。我们确定了一种方法,提供96.86%的准确率和97.37 %的回溯率,并讨论了拟议方法的可行性,从而得出了结论。