In this paper, we propose a provably correct algorithm for convolutive nonnegative matrix factorization (CNMF) under separability assumptions. CNMF is a convolutive variant of nonnegative matrix factorization (NMF), which functions as an NMF with additional sequential structure. This model is useful in a number of applications, such as audio source separation and neural sequence identification. While a number of heuristic algorithms have been proposed to solve CNMF, to the best of our knowledge no provably correct algorithms have been developed. We present an algorithm that takes advantage of the NMF model underlying CNMF and exploits existing algorithms for separable NMF to provably find a solution under certain conditions. Our approach guarantees the solution in low noise settings, and runs in polynomial time. We illustrate its effectiveness on synthetic datasets, and on a singing bird audio sequence.

Text clustering is arguably one of the most important topics in modern data mining. Nevertheless, text data require tokenization which usually yields a very large and highly sparse term-document matrix, which is usually difficult to process using conventional machine learning algorithms. Methods such as Latent Semantic Analysis have helped mitigate this issue, but are nevertheless not completely stable in practice. As a result, we propose a new feature agglomeration method based on Nonnegative Matrix Factorization. NMF is employed to separate the terms into groups, and then each groups term vectors are agglomerated into a new feature vector. Together, these feature vectors create a new feature space much more suitable for clustering. In addition, we propose a new deterministic initialization for spherical K-Means, which proves very useful for this specific type of data. In order to evaluate the proposed method, we compare it to some of the latest research done in this field, as well as some of the most practiced methods. In our experiments, we conclude that the proposed method either significantly improves clustering performance, or maintains the performance of other methods, while improving stability in results.

Binary data matrices can represent many types of data such as social networks, votes or gene expression. In some cases, the analysis of binary matrices can be tackled with nonnegative matrix factorization (NMF), where the observed data matrix is approximated by the product of two smaller nonnegative matrices. In this context, probabilistic NMF assumes a generative model where the data is usually Bernoulli-distributed. Often, a link function is used to map the factorization to the $[0,1]$ range, ensuring a valid Bernoulli mean parameter. However, link functions have the potential disadvantage to lead to uninterpretable models. Mean-parameterized NMF, on the contrary, overcomes this problem. We propose a unified framework for Bayesian mean-parameterized nonnegative binary matrix factorization models (NBMF). We analyze three models which correspond to three possible constraints that respect the mean-parametrization without the need for link functions. Furthermore, we derive a novel collapsed Gibbs sampler and a collapsed variational algorithm to infer the posterior distribution of the factors. Next, we extend the proposed models to a nonparametric setting where the number of used latent dimensions is automatically driven by the observed data. We analyze the performance of our NBMF methods in multiple datasets for different tasks such as dictionary learning and prediction of missing data. Experiments show that our methods provide similar or superior results than the state of the art, while automatically detecting the number of relevant components.

Recently, an audio-visual speech generative model based on variational autoencoder (VAE) has been proposed, which is combined with a nonnegative matrix factorization (NMF) model for noise variance to perform unsupervised speech enhancement. When visual data is clean, speech enhancement with audio-visual VAE shows a better performance than with audio-only VAE, which is trained on audio-only data. However, audio-visual VAE is not robust against noisy visual data, e.g., when for some video frames, speaker face is not frontal or lips region is occluded. In this paper, we propose a robust unsupervised audio-visual speech enhancement method based on a per-frame VAE mixture model. This mixture model consists of a trained audio-only VAE and a trained audio-visual VAE. The motivation is to skip noisy visual frames by switching to the audio-only VAE model. We present a variational expectation-maximization method to estimate the parameters of the model. Experiments show the promising performance of the proposed method.

Online Matrix Factorization (OMF) is a fundamental tool for dictionary learning problems, giving an approximate representation of complex data sets in terms of a reduced number of extracted features. Convergence guarantees for most of the OMF algorithms in the literature assume independence between data matrices, and the case of a dependent data stream remains largely unexplored. In this paper, we show that the well-known OMF algorithm for i.i.d. stream of data proposed in \cite{mairal2010online}, in fact converges almost surely to the set of critical points of the expected loss function, even when the data matrices form a Markov chain satisfying a mild mixing condition. Furthermore, we extend the convergence result to the case when we can only approximately solve each step of the optimization problems in the algorithm. For applications, we demonstrate dictionary learning from a sequence of images generated by a Markov Chain Monte Carlo (MCMC) sampler. Lastly, by combining online non-negative matrix factorization and a recent MCMC algorithm for sampling motifs from networks, we propose a novel framework of Network Dictionary Learning, which extracts network dictionary patches' from a given network in an online manner that encodes main features of the network. We demonstrate this technique on real-world text data.

We propose a nonparametric model for time series with missing data based on low-rank matrix factorization. The model expresses each instance in a set of time series as a linear combination of a small number of shared basis functions. Constraining the functions and the corresponding coefficients to be nonnegative yields an interpretable low-dimensional representation of the data. A time-smoothing regularization term ensures that the model captures meaningful trends in the data, instead of overfitting short-term fluctuations. The low-dimensional representation makes it possible to detect outliers and cluster the time series according to the interpretable features extracted by the model, and also to perform forecasting via kernel regression. We apply our methodology to a large real-world dataset of infant-sleep data gathered by caregivers with a mobile-phone app. Our analysis automatically extracts daily-sleep patterns consistent with the existing literature. This allows us to compute sleep-development trends for the cohort, which characterize the emergence of circadian sleep and different napping habits. We apply our methodology to detect anomalous individuals, to cluster the cohort into groups with different sleeping tendencies, and to obtain improved predictions of future sleep behavior.

This work explores non-negative matrix factorization based on regularized Poisson models for recommender systems with implicit-feedback data. The properties of Poisson likelihood allow a shortcut for very fast computation and optimization over elements with zero-value when the latent-factor matrices are non-negative, making it a more suitable approach than squared loss for very sparse inputs such as implicit-feedback data. A simple and embarrassingly parallel optimization approach based on proximal gradients is presented, which in large datasets converges 2-3 orders of magnitude faster than its Bayesian counterpart (Hierarchical Poisson Factorization) fit through variational inference techniques, and 1 order of magnitude faster than implicit-ALS fit with the Conjugate Gradient method.

With the growing importance of personalized recommendation, numerous recommendation models have been proposed recently. Among them, Matrix Factorization (MF) based models are the most widely used in the recommendation field due to their high performance. However, MF based models suffer from cold start problems where user-item interactions are sparse. To deal with this problem, content based recommendation models which use the auxiliary attributes of users and items have been proposed. Since these models use auxiliary attributes, they are effective in cold start settings. However, most of the proposed models are either unable to capture complex feature interactions or not properly designed to combine user-item feedback information with content information. In this paper, we propose Self-Attentive Integration Network (SAIN) which is a model that effectively combines user-item feedback information and auxiliary information for recommendation task. In SAIN, a self-attention mechanism is used in the feature-level interaction layer to effectively consider interactions between multiple features, while the information integration layer adaptively combines content and feedback information. The experimental results on two public datasets show that our model outperforms the state-of-the-art models by 2.13%

Online Matrix Factorization (OMF) is a fundamental tool for dictionary learning problems, giving an approximate representation of complex data sets in terms of a reduced number of extracted features. Convergence guarantees for most of the OMF algorithms in the literature assume independence between data matrices, and the case of a dependent data stream remains largely unexplored. In this paper, we show that the well-known OMF algorithm for i.i.d. stream of data proposed in \cite{mairal2010online}, in fact converges almost surely to the set of critical points of the expected loss function, even when the data matrices form a Markov chain satisfying a mild mixing condition. Furthermore, we extend the convergence result to the case when we can only approximately solve each step of the optimization problems in the algorithm. For applications, we demonstrate dictionary learning from a sequence of images generated by a Markov Chain Monte Carlo (MCMC) sampler. Lastly, by combining online non-negative matrix factorization and a recent MCMC algorithm for sampling motifs from networks, we propose a novel framework of Network Dictionary Learning, which extracts network dictionary patches' from a given network in an online manner that encodes main features of the network. We demonstrate this technique on real-world text data.

Online Matrix Factorization (OMF) is a fundamental tool for dictionary learning problems, giving an approximate representation of complex data sets in terms of a reduced number of extracted features. Convergence guarantees for most of the OMF algorithms in the literature assume independence between data matrices, and the case of a dependent data stream remains largely unexplored. In this paper, we show that the well-known OMF algorithm for i.i.d. stream of data proposed in \cite{mairal2010online}, in fact converges almost surely to the set of critical points of the expected loss function, even when the data matrices form a Markov chain satisfying a mild mixing condition. Furthermore, we extend the convergence result to the case when we can only approximately solve each step of the optimization problems in the algorithm. For applications, we demonstrate dictionary learning from a sequence of images generated by a Markov Chain Monte Carlo (MCMC) sampler. Lastly, by combining online non-negative matrix factorization and a recent MCMC algorithm for sampling motifs from networks, we propose a novel framework of Network Dictionary Learning, which extracts network dictionary patches' from a given network in an online manner that encodes main features of the network. We demonstrate this technique on real-world text data.

As the first step in automated natural language processing, representing words and sentences is of central importance and has attracted significant research attention. Different approaches, from the early one-hot and bag-of-words representation to more recent distributional dense and sparse representations, were proposed. Despite the successful results that have been achieved, such vectors tend to consist of uninterpretable components and face nontrivial challenge in both memory and computational requirement in practical applications. In this paper, we designed a novel representation model that projects dense word vectors into a higher dimensional space and favors a highly sparse and binary representation of word vectors with potentially interpretable components, while trying to maintain pairwise inner products between original vectors as much as possible. Computationally, our model is relaxed as a symmetric non-negative matrix factorization problem which admits a fast yet effective solution. In a series of empirical evaluations, the proposed model exhibited consistent improvement and high potential in practical applications.

In the last decade, the digital age has sharply redefined the way we study human behavior. With the advancement of data storage and sensing technologies, electronic records now encompass a diverse spectrum of human activity, ranging from location data, phone and email communication to Twitter activity and open-source contributions on Wikipedia and OpenStreetMap. In particular, the study of the shopping and mobility patterns of individual consumers has the potential to give deeper insight into the lifestyles and infrastructure of the region. Credit card records (CCRs) provide detailed insight into purchase behavior and have been found to have inherent regularity in consumer shopping patterns; call detail records (CDRs) present new opportunities to understand human mobility, analyze wealth, and model social network dynamics. In this chapter, we jointly model the lifestyles of individuals, a more challenging problem with higher variability when compared to the aggregated behavior of city regions. Using collective matrix factorization, we propose a unified dual view of lifestyles. Understanding these lifestyles will not only inform commercial opportunities, but also help policymakers and nonprofit organizations understand the characteristics and needs of the entire region, as well as of the individuals within that region. The applications of this range from targeted advertisements and promotions to the diffusion of digital financial services among low-income groups.

Discriminative models for source separation have recently been shown to produce impressive results. However, when operating on sources outside of the training set, these models can not perform as well and are cumbersome to update. Classical methods like Non-negative Matrix Factorization (NMF) provide modular approaches to source separation that can be easily updated to adapt to new mixture scenarios. In this paper, we generalize NMF to develop end-to-end non-negative auto-encoders and demonstrate how they can be used for source separation. Our experiments indicate that these models deliver comparable separation performance to discriminative approaches, while retaining the modularity of NMF and the modeling flexibility of neural networks.

We consider the setup of nonparametric {\em blind regression} for estimating the entries of a large $m \times n$ matrix, when provided with a small, random fraction of noisy measurements. We assume that all rows $u \in [m]$ and columns $i \in [n]$ of the matrix are associated to latent features $x_{\text{row}}(u)$ and $x_{\text{col}}(i)$ respectively, and the $(u,i)$-th entry of the matrix, $A(u, i)$ is equal to $f(x_{\text{row}}(u), x_{\text{col}}(i))$ for a latent function $f$. Given noisy observations of a small, random subset of the matrix entries, our goal is to estimate the unobserved entries of the matrix as well as to "de-noise" the observed entries. As the main result of this work, we introduce a nearest-neighbor-based estimation algorithm, and establish its consistency when the underlying latent function $f$ is Lipschitz, the underlying latent space is a bounded diameter Polish space, and the random fraction of observed entries in the matrix is at least $\max \left( m^{-1 + \delta}, n^{-1/2 + \delta} \right)$, for any $\delta > 0$. As an important byproduct, our analysis sheds light into the performance of the classical collaborative filtering algorithm for matrix completion, which has been widely utilized in practice. Experiments with the MovieLens and Netflix datasets suggest that our algorithm provides a principled improvement over basic collaborative filtering and is competitive with matrix factorization methods. Our algorithm has a natural extension to the setting of tensor completion via flattening the tensor to matrix. When applied to the setting of image in-painting, which is a $3$-order tensor, we find that our approach is competitive with respect to state-of-art tensor completion algorithms across benchmark images.

Top