Online Matrix Factorization (OMF) is a fundamental tool for dictionary learning problems, giving an approximate representation of complex data sets in terms of a reduced number of extracted features. Convergence guarantees for most of the OMF algorithms in the literature assume independence between data matrices, and the case of a dependent data stream remains largely unexplored. In this paper, we show that the well-known OMF algorithm for i.i.d. stream of data proposed in \cite{mairal2010online}, in fact converges almost surely to the set of critical points of the expected loss function, even when the data matrices form a Markov chain satisfying a mild mixing condition. Furthermore, we extend the convergence result to the case when we can only approximately solve each step of the optimization problems in the algorithm. For applications, we demonstrate dictionary learning from a sequence of images generated by a Markov Chain Monte Carlo (MCMC) sampler. Lastly, by combining online non-negative matrix factorization and a recent MCMC algorithm for sampling motifs from networks, we propose a novel framework of Network Dictionary Learning, which extracts `network dictionary patches' from a given network in an online manner that encodes main features of the network. We demonstrate this technique on real-world text data.
翻译:在线矩阵参数化(OMF)是字典学习问题的基本工具,它以较少的提取功能数量大致代表了复杂的数据集。文献中大多数OMF算法的一致保证假定数据矩阵之间具有独立性,而依赖数据流的情况基本上尚未探索。在本文中,我们显示,在\cite{mairal2010online}中提议的i.d.数据流中众所周知的OMF算法实际上几乎肯定地与预期损失函数的临界点组合相融合,即使数据矩阵形成马可夫链,满足了温和混合条件。此外,我们将趋同结果扩大到我们只能大致解决算法中优化问题的每一个步骤的情况。关于应用,我们展示了从Markov链蒙特卡洛(MCMC)取样器所生成的图像序列中学习的字典。最后,通过将在线非负式矩阵因子化和近期的MCMC算法从网络中取样模型,我们提议了一个网络变形学习新框架,从网络中提取了网络正式数据功能。