Massive volumes of data continuously generated on social platforms have become an important information source for users. A primary method to obtain fresh and valuable information from social streams is \emph{social search}. Although there have been extensive studies on social search, existing methods only focus on the \emph{relevance} of query results but ignore the \emph{representativeness}. In this paper, we propose a novel Semantic and Influence aware $k$-Representative ($k$-SIR) query for social streams based on topic modeling. Specifically, we consider that both user queries and elements are represented as vectors in the topic space. A $k$-SIR query retrieves a set of $k$ elements with the maximum \emph{representativeness} over the sliding window at query time w.r.t. the query vector. The representativeness of an element set comprises both semantic and influence scores computed by the topic model. Subsequently, we design two approximation algorithms, namely \textsc{Multi-Topic ThresholdStream} (MTTS) and \textsc{Multi-Topic ThresholdDescend} (MTTD), to process $k$-SIR queries in real-time. Both algorithms leverage the ranked lists maintained on each topic for $k$-SIR processing with theoretical guarantees. Extensive experiments on real-world datasets demonstrate the effectiveness of $k$-SIR query compared with existing methods as well as the efficiency and scalability of our proposed algorithms for $k$-SIR processing.
翻译:在社会平台上持续生成的大量数据已成为用户的一个重要信息源。 从社会流获取最新和有价值的信息的主要方法是 emph{ social search}。 尽管已经对社会搜索进行了广泛的研究, 但现有方法只侧重于查询结果的 emph{ relations}, 却忽略了查询结果的 emph{ 代表性 。 在本文中, 我们提议根据主题模型为社会流设计一个新的语义和影响意识$- 代表(k$- SIR) 查询。 具体地说, 我们认为用户查询和元素都是主题空间中的矢量。 $k$( SIR) 查询在查询时间 w.r. t. 的滑动窗口上检索了一套$( emph{ 代表性 ) 最高值元素, 而忽略了查询结果集包括由主题模型计算出来的语义和影响分数。 随后, 我们设计了两种近似算法, 即\ textc{Multi- topictressold StrestrainS- train rial- trainSral- tracks sal- dal- ligal- sal- sal- likets livers sal- sal- sal- sal- sal- liews- sal- sal- sal- sal- sleval- sal- sal- sal- salds sal- salds- sal- saldaldaldals sal- sal- sal- sildaldaldalss sals.