In this study, we investigate the scalability of state-of-the-art user profiling technologies across different online domains. More specifically, this work aims to understand the reliability and limitations of current computational stylometry approaches when these are applied to underground fora in which user populations potentially differ from other online platforms (predominantly male, younger age and greater computer use) and cyber offenders who attempt to hide their identity. Because no ground truth is available and no validated criminal data from historic investigations is available for validation purposes, we have collected new data from clearweb forums that do include user demographics and could be more closely related to underground fora in terms of user population (e.g., tech communities) than commonly used social media benchmark datasets showing a more balanced user population.
翻译:在这项研究中,我们调查了不同在线领域最先进的用户特征分析技术的可扩展性。更具体地说,这项工作旨在了解当前计算性测量方法的可靠性和局限性,当这些方法应用于地下论坛时,在地下论坛中,用户群体可能与其他在线平台(主要是男性、年轻人和更多的计算机使用)和试图隐藏身份的网络罪犯不同。由于没有事实根据,历史调查中也没有经过验证的犯罪数据,因此,我们从清晰的网络论坛中收集了新数据,其中确实包括用户人口,在用户人口(例如技术社区)方面与地下论坛的关系可能比通常使用的社会媒体基准数据集更加密切,显示用户人口更加均衡。