Factorization of matrices where the rank of the two factors diverges linearly with their sizes has many applications in diverse areas such as unsupervised representation learning, dictionary learning or sparse coding. We consider a setting where the two factors are generated from known component-wise independent prior distributions, and the statistician observes a (possibly noisy) component-wise function of their matrix product. In the limit where the dimensions of the matrices tend to infinity, but their ratios remain fixed, we expect to be able to derive closed form expressions for the optimal mean squared error on the estimation of the two factors. However, this remains a very involved mathematical and algorithmic problem. A related, but simpler, problem is extensive-rank matrix denoising, where one aims to reconstruct a matrix with extensive but usually small rank from noisy measurements. In this paper, we approach both these problems using high-temperature expansions at fixed order parameters. This allows to clarify how previous attempts at solving these problems failed at finding an asymptotically exact solution. We provide a systematic way to derive the corrections to these existing approximations, taking into account the structure of correlations particular to the problem. Finally, we illustrate our approach in detail on the case of extensive-rank matrix denoising. We compare our results with known optimal rotationally-invariant estimators, and show how exact asymptotic calculations of the minimal error can be performed using extensive-rank matrix integrals.

### 相关内容

In this paper we propose a new algorithm for solving large-scale algebraic Riccati equations with low-rank structure. The algorithm is based on the found Toeplitz-structured closed form of the stabilizing solution and the fast Fourier transform. It works without unnecessary assumptions, shift selection trategies, or matrix calculations of the cubic order with respect to the problem scale. Numerical examples are given to illustrate its features. Besides, we show that it is theoretically equivalent to several algorithms existing in the literature in the sense that they all produce the same sequence under the same parameter setting.

Computing sample means on Riemannian manifolds is typically computationally costly. The Fr\'echet mean offers a generalization of the Euclidean mean to general metric spaces, particularly to Riemannian manifolds. Evaluating the Fr\'echet mean numerically on Riemannian manifolds requires the computation of geodesics for each sample point. When closed-form expressions do not exist for geodesics, an optimization-based approach is employed. In geometric deep-learning, particularly Riemannian convolutional neural networks, a weighted Fr\'echet mean enters each layer of the network, potentially requiring an optimization in each layer. The weighted diffusion-mean offers an alternative weighted mean sample estimator on Riemannian manifolds that do not require the computation of geodesics. Instead, we present a simulation scheme to sample guided diffusion bridges on a product manifold conditioned to intersect at a predetermined time. Such a conditioning is non-trivial since, in general, manifolds cannot be covered by a single chart. Exploiting the exponential chart, the conditioning can be made similar to that in the Euclidean setting.

Relaxation methods such as Jacobi or Gauss-Seidel are often applied as smoothers in algebraic multigrid. Incomplete factorizations can also be employed, however, direct triangular solves are comparatively slow on GPUs. Previous work by Antz et al. \cite{Anzt2015} proposed an iterative approach for solving such sparse triangular systems. However, when using the stationary Jacobi iteration, if the upper or lower triangular factor is highly non-normal, the iterations will diverge. An ILUT smoother is introduced for classical Ruge-St\"uben C-AMG that applies Ruiz scaling to mitigate the non-normality of the upper triangular factor. Our approach facilitates the use of Jacobi iteration in place of the inherently sequential triangular solve. Because the scaling is applied to the upper triangular factor as opposed to the global matrix, it can be done locally on an MPI-rank for a diagonal block of the global matrix. A performance model is provided along with numerical results for matrices extracted from the PeleLM \cite{PeleLM} pressure continuity solver.

Cluster-weighted models (CWMs) extend finite mixtures of regressions (FMRs) in order to allow the distribution of covariates to contribute to the clustering process. In a matrix-variate framework, the matrix-variate normal CWM has been recently introduced. However, problems may be encountered when data exhibit skewness or other deviations from normality in the responses, covariates or both. Thus, we introduce a family of 24 matrix-variate CWMs which are obtained by allowing both the responses and covariates to be modelled by using one of four existing skewed matrix-variate distributions or the matrix-variate normal distribution. Endowed with a greater flexibility, our matrix-variate CWMs are able to handle this kind of data in a more suitable manner. As a by-product, the four skewed matrix-variate FMRs are also introduced. Maximum likelihood parameter estimates are derived using an expectation-conditional maximization algorithm. Parameter recovery, classification assessment, and the capability of the Bayesian information criterion to detect the underlying groups are investigated using simulated data. Lastly, our matrix-variate CWMs, along with the matrix-variate normal CWM and matrix-variate FMRs, are applied to two real datasets for illustrative purposes.

Matrix Factorization plays an important role in machine learning such as Non-negative Matrix Factorization, Principal Component Analysis, Dictionary Learning, etc. However, most of the studies aim to minimize the loss by measuring the Euclidean distance, though in some fields, angle distance is known to be more important and critical for analysis. In this paper, we propose a method by adding constraints on factors to unify the Euclidean and angle distance. However, due to non-convexity of the objective and constraints, the optimized solution is not easy to obtain. In this paper we propose a general framework to systematically solve it with provable convergence guarantee with various constraints.

We prove that a variant of the classical Sobolev space of first-order dominating mixed smoothness is equivalent (under a certain condition) to the unanchored ANOVA space on $\mathbb{R}^d$, for $d \geq 1$. Both spaces are Hilbert spaces involving weight functions, which determine the behaviour as different variables tend to $\pm \infty$, and weight parameters, which represent the influence of different subsets of variables. The unanchored ANOVA space on $\mathbb{R}^d$ was initially introduced by Nichols & Kuo in 2014 to analyse the error of quasi-Monte Carlo (QMC) approximations for integrals on unbounded domains; whereas the classical Sobolev space of dominating mixed smoothness was used as the setting in a series of papers by Griebel, Kuo & Sloan on the smoothing effect of integration, in an effort to develop a rigorous theory on why QMC methods work so well for certain non-smooth integrands with kinks or jumps coming from option pricing problems. In this same setting, Griewank, Kuo, Le\"ovey & Sloan in 2018 subsequently extended these ideas by developing a practical smoothing by preintegration technique to approximate integrals of such functions with kinks or jumps. We first prove the equivalence in one dimension (itself a non-trivial task), before following a similar, but more complicated, strategy to prove the equivalence for general dimensions. As a consequence of this equivalence, we analyse applying QMC combined with a preintegration step to approximate the fair price of an Asian option, and prove that the error of such an approximation using $N$ points converges at a rate close to $1/N$.

In this work, we study the Neural Tangent Kernel (NTK) of Matrix Product States (MPS) and the convergence of its NTK in the infinite bond dimensional limit. We prove that the NTK of MPS asymptotically converges to a constant matrix during the gradient descent (training) process (and also the initialization phase) as the bond dimensions of MPS go to infinity by the observation that the variation of the tensors in MPS asymptotically goes to zero during training in the infinite limit. By showing the positive-definiteness of the NTK of MPS, the convergence of MPS during the training in the function space (space of functions represented by MPS) is guaranteed without any extra assumptions of the data set. We then consider the settings of (supervised) Regression with Mean Square Error (RMSE) and (unsupervised) Born Machines (BM) and analyze their dynamics in the infinite bond dimensional limit. The ordinary differential equations (ODEs) which describe the dynamics of the responses of MPS in the RMSE and BM are derived and solved in the closed-form. For the Regression, we consider Mercer Kernels (Gaussian Kernels) and find that the evolution of the mean of the responses of MPS follows the largest eigenvalue of the NTK. Due to the orthogonality of the kernel functions in BM, the evolution of different modes (samples) decouples and the "characteristic time" of convergence in training is obtained.

Disobeying the classical wisdom of statistical learning theory, modern deep neural networks generalize well even though they typically contain millions of parameters. Recently, it has been shown that the trajectories of iterative optimization algorithms can possess fractal structures, and their generalization error can be formally linked to the complexity of such fractals. This complexity is measured by the fractal's intrinsic dimension, a quantity usually much smaller than the number of parameters in the network. Even though this perspective provides an explanation for why overparametrized networks would not overfit, computing the intrinsic dimension (e.g., for monitoring generalization during training) is a notoriously difficult task, where existing methods typically fail even in moderate ambient dimensions. In this study, we consider this problem from the lens of topological data analysis (TDA) and develop a generic computational tool that is built on rigorous mathematical foundations. By making a novel connection between learning theory and TDA, we first illustrate that the generalization error can be equivalently bounded in terms of a notion called the 'persistent homology dimension' (PHD), where, compared with prior work, our approach does not require any additional geometrical or statistical assumptions on the training dynamics. Then, by utilizing recently established theoretical results and TDA tools, we develop an efficient algorithm to estimate PHD in the scale of modern deep neural networks and further provide visualization tools to help understand generalization in deep learning. Our experiments show that the proposed approach can efficiently compute a network's intrinsic dimension in a variety of settings, which is predictive of the generalization error.

In this paper we propose a new algorithm for solving large-scale algebraic Riccati equations with low-rank structure, which is based on the found elegant closed form of the stabilizing solution that involves an intrinsic Toeplitz structure and the fast Fourier transform used to accelerate the multiplication of a Toeplitz matrix and vectors. The algorithm works without unnecessary assumptions, shift selection trategies, or matrix calculations of the cubic order with respect to the problem scale. Numerical examples are given to illustrate its features. Besides, we show that it is theoretically equivalent to several algorithms existing in the literature in the sense that they all produce the same sequence under the same parameter setting.

Several recent applications of optimal transport (OT) theory to machine learning have relied on regularization, notably entropy and the Sinkhorn algorithm. Because matrix-vector products are pervasive in the Sinkhorn algorithm, several works have proposed to \textit{approximate} kernel matrices appearing in its iterations using low-rank factors. Another route lies instead in imposing low-rank constraints on the feasible set of couplings considered in OT problems, with no approximations on cost nor kernel matrices. This route was first explored by Forrow et al., 2018, who proposed an algorithm tailored for the squared Euclidean ground cost, using a proxy objective that can be solved through the machinery of regularized 2-Wasserstein barycenters. Building on this, we introduce in this work a generic approach that aims at solving, in full generality, the OT problem under low-rank constraints with arbitrary costs. Our algorithm relies on an explicit factorization of low rank couplings as a product of \textit{sub-coupling} factors linked by a common marginal; similar to an NMF approach, we alternatively updates these factors. We prove the non-asymptotic stationary convergence of this algorithm and illustrate its efficiency on benchmark experiments.

Mathias Højgaard Jensen,Stefan Sommer
0+阅读 · 12月1日
Stephen Thomas,Arielle Carr,Kasia Świrydowicz,Marc Day
0+阅读 · 11月30日
Michael P. B. Gallaugher,Salvatore D. Tomarchio,Paul D. McNicholas,Antonio Punzo
0+阅读 · 11月29日
Kai Liu
0+阅读 · 11月29日
Tolga Birdal,Aaron Lou,Leonidas Guibas,Umut Şimşekli
0+阅读 · 11月25日
Meyer Scetbon,Marco Cuturi,Gabriel Peyré
9+阅读 · 3月8日

21+阅读 · 4月10日

37+阅读 · 3月16日

32+阅读 · 2020年8月16日

23+阅读 · 2019年10月17日

47+阅读 · 2019年10月10日

6+阅读 · 2020年4月8日

14+阅读 · 2019年6月12日
CreateAMind
12+阅读 · 2019年5月22日
CreateAMind
32+阅读 · 2019年1月3日
CreateAMind
8+阅读 · 2018年12月10日
CreateAMind
24+阅读 · 2018年9月12日
CreateAMind
3+阅读 · 2018年4月15日

6+阅读 · 2017年11月25日

24+阅读 · 2017年11月16日
CreateAMind
5+阅读 · 2017年8月4日
Top