Microblogging platforms constitute a popular means of real-time communication and information sharing. They involve such a large volume of user-generated content that their users suffer from an information deluge. To address it, numerous recommendation methods have been proposed to organize the posts a user receives according to her interests. The content-based methods typically build a text-based model for every individual user to capture her tastes and then rank the posts in her timeline according to their similarity with that model. Even though content-based methods have attracted lots of interest in the data management community, there is no comprehensive evaluation of the main factors that affect their performance. These are: (i) the representation model that converts an unstructured text into a structured representation that elucidates its characteristics, (ii) the source of the microblog posts that compose the user models, and (iii) the type of user's posting activity. To cover this gap, we systematically examine the performance of 9 state-of-the-art representation models in combination with 13 representation sources and 3 user types over a large, real dataset from Twitter comprising 60 users. We also consider a wide range of 223 plausible configurations for the representation models in order to assess their robustness with respect to their internal parameters. To facilitate the interpretation of our experimental results, we introduce a novel taxonomy of representation models. Our analysis provides novel insights into the performance and functionality of the main factors determining the performance of content-based recommendation in microblogs.
翻译:微博平台是一种流行的实时通信和信息分享手段,它们涉及大量用户生成的内容,其用户因此遭受信息大流的影响。为了解决这一问题,提出了许多建议方法,以组织用户根据自己的利益获得的职位。内容基础方法通常为每个用户建立一个基于文本的模式,以捕捉她的口味,然后根据与该模式相似之处按其时间表排列职位。尽管内容基础方法吸引了数据管理界的极大兴趣,但没有对影响其业绩的主要因素进行全面评估。这些是:(一) 代表模式,将非结构化文本转换成结构化代表,阐明其特点;(二) 构成用户模式的微博职位的来源,以及(三) 用户的张贴活动类型。为了弥补这一差距,我们系统地审查9个最先进的代表模式的绩效,加上13个代表来源和3个用户类型,以及由60个用户组成的大型、真实数据集。我们还考虑将223个直观文本转换成结构化文本的结构,阐明其特点,(二) 微博博博客员额的来源来源,以及我们为确定其业绩的新型分析提供了一种可靠的内部分析,从而评估其内部分析结果。