使用链接的开放数据进行微博客专题识别 (Microblog Topic Identification using Linked Open Data)

The extensive use of social media for sharing and obtaining information has resulted in the development of topic detection models to facilitate the comprehension of the overwhelming amount of short and distributed posts. Probabilistic topic models, such as Latent Dirichlet Allocation, and matrix factorization based approaches such as Latent Semantic Analysis and Non-negative Matrix Factorization represent topics as sets of terms that are useful for many automated processes. However, the determination of what a topic is about is left as a further task. Alternatively, techniques that produce summaries are human comprehensible, but less suitable for automated processing. This work proposes an approach that utilizes Linked Open Data (LOD) resources to extract semantically represented topics from collections of microposts. The proposed approach utilizes entity linking to identify the elements of topics from microposts. The elements are related through co-occurrence graphs, which are processed to yield topics. The topics are represented using an ontology that is introduced for this purpose. A prototype of the approach is used to identify topics from 11 datasets consisting of more than one million posts collected from Twitter during various events, such as the 2016 US election debates and the death of Carrie Fisher. The characteristics of the approach with more than 5 thousand generated topics are described in detail. The potentials of semantic topics in revealing information, that is not otherwise easily observable, is demonstrated with semantic queries of various complexities. A human evaluation of topics from 36 randomly selected intervals resulted in a precision of 81.0% and F1 score of 93.3%. Furthermore, they are compared with topics generated from the same datasets from an approach that produces human readable topics from microblog post collections.

翻译：广泛使用社交媒体来分享和获取信息,这导致开发了专题探测模型,以便于理解数量庞大的短数和分布式职位。概率性主题模型,如Lentant Dirichlet分配,以及基于矩阵要素化的方法,如Lient 语义分析和非负矩阵矩阵化,代表了对许多自动化进程有用的一系列术语。然而,确定一个主题的术语留作进一步任务。或者,制作摘要的技术是人类可理解的,但更不适于自动处理。这项工作建议采用一种方法,利用链接的开放数据(LOD)资源从微调的收藏中提取精度代表主题。拟议的方法利用实体链接来确定微调主题的元素。这些元素通过共同访问图进行关联,这些元素被处理为产生主题。这些主题使用了为此而引入的文理学。使用一种随机方法的原型,从11个数据集中收集了超过100万个的标本,从各种事件中采集的标本,例如从Twitter上采集的标本,在2016年选举中以其他方式绘制了39个专题的标本。阅读了各种论文的标本。

相关内容

Automator

关注 4

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【WWW2020-UIUC】自动主题分类法构建，Automated Topic Taxonomy Construction

专知会员服务

39+阅读 · 2020年3月22日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

49+阅读 · 2020年2月26日