【论文推荐】最新6篇主题模型相关论文—正则化变分推断主题模型、非参数先验、在线聊天、词义消歧、神经语言模型

【导读】专知内容组整理了最近六篇主题模型(Topic Modeling)相关文章,为大家进行介绍,欢迎查看!

1. Topic Modeling on Health Journals with Regularized Variational Inference(基于正则化变分推断主题模型的健康杂志分析




作者Robert Giaquinto,Arindam Banerjee

摘要Topic modeling enables exploration and compact representation of a corpus. The CaringBridge (CB) dataset is a massive collection of journals written by patients and caregivers during a health crisis. Topic modeling on the CB dataset, however, is challenging due to the asynchronous nature of multiple authors writing about their health journeys. To overcome this challenge we introduce the Dynamic Author-Persona topic model (DAP), a probabilistic graphical model designed for temporal corpora with multiple authors. The novelty of the DAP model lies in its representation of authors by a persona --- where personas capture the propensity to write about certain topics over time. Further, we present a regularized variational inference algorithm, which we use to encourage the DAP model's personas to be distinct. Our results show significant improvements over competing topic models --- particularly after regularization, and highlight the DAP model's unique ability to capture common journeys shared by different authors.


期刊:arXiv, 2018年1月16日

网址

http://www.zhuanzhi.ai/document/54eaa7fc454fdd76f151d73e09800876


2. Latent nested nonparametric priors潜在的嵌套的非参数先验




作者Federico Camerlenghi,David B. Dunson,Antonio Lijoi,Igor Prünster,Abel Rodríguez

摘要Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.

期刊:arXiv, 2018年1月16日

网址

http://www.zhuanzhi.ai/document/1d93522cec3fa8b21451dd3738528d05


3. Between an Arena and a Sports Bar: Online Chats of eSports Spectators在竞技场和体育酒吧之间:电子竞技观众的在线聊天




作者Ilya Musabirov,Denis Bulygin,Paul Okopny,Ksenia Konstantinova

摘要ESports tournaments, such as Dota 2's The International (TI), attract millions of spectators to watch broadcasts on online streaming platforms, to communicate, and to share their experience and emotions. Unlike traditional streams, tournament broadcasts lack a streamer figure to which spectators can appeal directly. Using topic modelling and cross-correlation analysis of more than three million messages from 86 games of TI7, we uncover main topical and temporal patterns of communication. First, we disentangle contextual meanings of emotes and memes, which play a salient role in communication, and show a meta-topics semantic map of streaming slang. Second, our analysis shows a prevalence of the event-driven game communication during tournament broadcasts and particular topics associated with the event peaks. Third, we show that "copypasta" cascades and other related practices, while occupying a significant share of messages, are strongly associated with periods of lower in-game activity. Based on the analysis, we propose design ideas to support different modes of spectators' communication.

期刊:arXiv, 2018年1月9日

网址

http://www.zhuanzhi.ai/document/0595c45b2eed2dc044064cc66c1f85e0


4. Knowledge-based Word Sense Disambiguation using Topic Models基于主题模型的以知识为基础的词义消歧




作者Devendra Singh Chaplot,Ruslan Salakhutdinov

摘要Word Sense Disambiguation is an open problem in Natural Language Processing which is particularly challenging and useful in the unsupervised setting where all the words in any given text need to be disambiguated without using any labeled data. Typically WSD systems use the sentence or a small window of words around the target word as the context for disambiguation because their computational complexity scales exponentially with the size of the context. In this paper, we leverage the formalism of topic model to design a WSD system that scales linearly with the number of words in the context. As a result, our system is able to utilize the whole document as the context for a word to be disambiguated. The proposed method is a variant of Latent Dirichlet Allocation in which the topic proportions for a document are replaced by synset proportions. We further utilize the information in the WordNet by assigning a non-uniform prior to synset distribution over words and a logistic-normal prior for document distribution over synsets. We evaluate the proposed method on Senseval-2, Senseval-3, SemEval-2007, SemEval-2013 and SemEval-2015 English All-Word WSD datasets and show that it outperforms the state-of-the-art unsupervised knowledge-based WSD system by a significant margin.


期刊:arXiv, 2018年1月6日

网址

http://www.zhuanzhi.ai/document/7c88481a97379dde4cc5761cde0037b0


5. Topic Compositional Neural Language Model神经语言模型和主题结合的方法




作者Wenlin Wang,Zhe Gan,Wenqi Wang,Dinghan Shen,Jiaji Huang,Wei Ping,Sanjeev Satheesh,Lawrence Carin

摘要We propose a Topic Compositional Neural Language Model (TCNLM), a novel method designed to simultaneously capture both the global semantic meaning and the local word ordering structure in a document. The TCNLM learns the global semantic coherence of a document via a neural topic model, and the probability of each learned latent topic is further used to build a Mixture-of-Experts (MoE) language model, where each expert (corresponding to one topic) is a recurrent neural network (RNN) that accounts for learning the local structure of a word sequence. In order to train the MoE model efficiently, a matrix factorization method is applied, by extending each weight matrix of the RNN to be an ensemble of topic-dependent weight matrices. The degree to which each member of the ensemble is used is tied to the document-dependent probability of the corresponding topics. Experimental results on several corpora show that the proposed approach outperforms both a pure RNN-based model and other topic-guided language models. Further, our model yields sensible topics, and also has the capacity to generate meaningful sentences conditioned on given topics.

期刊:arXiv, 2017年12月29日

网址

http://www.zhuanzhi.ai/document/dd777f72fa4cf6222e3a8cfa76c02c73


6. Multilingual Topic Models多语言主题模型




作者Kriste Krstovski,Michael J. Kurtz,David A. Smith,Alberto Accomazzi

摘要Scientific publications have evolved several features for mitigating vocabulary mismatch when indexing, retrieving, and computing similarity between articles. These mitigation strategies range from simply focusing on high-value article sections, such as titles and abstracts, to assigning keywords, often from controlled vocabularies, either manually or through automatic annotation. Various document representation schemes possess different cost-benefit tradeoffs. In this paper, we propose to model different representations of the same article as translations of each other, all generated from a common latent representation in a multilingual topic model. We start with a methodological overview on latent variable models for parallel document representations that could be used across many information science tasks. We then show how solving the inference problem of mapping diverse representations into a shared topic space allows us to evaluate representations based on how topically similar they are to the original article. In addition, our proposed approach provides means to discover where different concept vocabularies require improvement.


期刊:arXiv, 2017年12月19日

网址

http://www.zhuanzhi.ai/document/fd0f8c1e4f305e0d3784574b22ccb4d5

-END-

专 · 知

人工智能领域主题知识资料查看获取【专知荟萃】人工智能领域26个主题知识资料全集(入门/进阶/论文/综述/视频/专家等)

同时欢迎各位用户进行专知投稿,详情请点击

诚邀】专知诚挚邀请各位专业者加入AI创作者计划了解使用专知!

请PC登录www.zhuanzhi.ai或者点击阅读原文,注册登录专知,获取更多AI知识资料

请扫一扫如下二维码关注我们的公众号,获取人工智能的专业知识!

请加专知小助手微信(Rancho_Fang),加入专知主题人工智能群交流!

点击“阅读原文”,使用专知

展开全文
Top
微信扫码咨询专知VIP会员