We propose a novel scheme for efficient Dirac mixture modeling of distributions on unit hyperspheres. A so-called hyperspherical localized cumulative distribution (HLCD) is introduced as a local and smooth characterization of the underlying continuous density in hyperspherical domains. Based on HLCD, a manifold-adapted modification of the Cram\'er-von Mises distance (HCvMD) is established to measure the statistical divergence between two Dirac mixtures of arbitrary dimensions. Given a (source) Dirac mixture with many components representing an unknown hyperspherical distribution, a (target) Dirac mixture with fewer components is obtained via matching the source in the sense of least HCvMD. As the number of target Dirac components is configurable, the underlying distributions is represented in a more efficient and informative way. Based upon this hyperspherical Dirac mixture reapproximation (HDMR), we derive a density estimation method and a recursive filter. For density estimation, a maximum likelihood method is provided to reconstruct the underlying continuous distribution in the form of a von Mises-Fisher mixture. For recursive filtering, we introduce the hyperspherical reapproximation discrete filter (HRDF) for nonlinear hyperspherical estimation of dynamic systems under unknown system noise of arbitrary form. Simulations show that the HRDF delivers superior tracking performance over filters using sequential Monte Carlo and parametric modeling.

### 相关内容

Due to the curse of dimensionality and the limitation on training data, approximating high-dimensional functions is a very challenging task even for powerful deep neural networks. Inspired by the Nonlinear Level set Learning (NLL) method that uses the reversible residual network (RevNet), in this paper we propose a new method of Dimension Reduction via Learning Level Sets (DRiLLS) for function approximation. Our method contains two major components: one is the pseudo-reversible neural network (PRNN) module that effectively transforms high-dimensional input variables to low-dimensional active variables, and the other is the synthesized regression module for approximating function values based on the transformed data in the low-dimensional space. The PRNN not only relaxes the invertibility constraint of the nonlinear transformation present in the NLL method due to the use of RevNet, but also adaptively weights the influence of each sample and controls the sensitivity of the function to the learned active variables. The synthesized regression uses Euclidean distance in the input space to select neighboring samples, whose projections on the space of active variables are used to perform local least-squares polynomial fitting. This helps to resolve numerical oscillation issues present in traditional local and global regressions. Extensive experimental results demonstrate that our DRiLLS method outperforms both the NLL and Active Subspace methods, especially when the target function possesses critical points in the interior of its input domain.

This article studies a priori error analysis for linear parabolic interface problems with measure data in time in a bounded convex polygonal domain in $\mathbb{R}^2$. We have used the standard continuous fitted finite element discretization for the space. Due to the low regularity of the data of the problem, the solution possesses very low regularity in the entire domain. A priori error bound in the $L^2(L^2(\Omega))$-norm for the spatially discrete finite element approximations are derived under minimal regularity with the help of the $L^2$ projection operators and the duality argument. The interfaces are assumed to be smooth for our purpose.

We consider the problem of clustering mixtures of mean-separated Gaussians in high dimensions. We are given samples from a mixture of $k$ identity covariance Gaussians, so that the minimum pairwise distance between any two pairs of means is at least $\Delta$, for some parameter $\Delta > 0$, and the goal is to recover the ground truth clustering of these samples. It is folklore that separation $\Delta = \Theta (\sqrt{\log k})$ is both necessary and sufficient to recover a good clustering, at least information theoretically. However, the estimators which achieve this guarantee are inefficient. We give the first algorithm which runs in polynomial time, and which almost matches this guarantee. More precisely, we give an algorithm which takes polynomially many samples and time, and which can successfully recover a good clustering, so long as the separation is $\Delta = \Omega (\log^{1/2 + c} k)$, for any $c > 0$. Previously, polynomial time algorithms were only known for this problem when the separation was polynomial in $k$, and all algorithms which could tolerate $\textsf{poly}( \log k )$ separation required quasipolynomial time. We also extend our result to mixtures of translations of a distribution which satisfies the Poincar\'{e} inequality, under additional mild assumptions. Our main technical tool, which we believe is of independent interest, is a novel way to implicitly represent and estimate high degree moments of a distribution, which allows us to extract important information about high-degree moments without ever writing down the full moment tensors explicitly.

We investigate the Fisher information matrix (FIM) of one hidden layer networks with the ReLU activation function and obtain an approximate spectral decomposition of FIM under certain conditions. From this decomposition, we can approximate the main eigenvalues and eigenvectors. We confirmed by numerical simulation that the obtained decomposition is approximately correct when the number of hidden nodes is about 10000.

Independent Component Analysis (ICA) is intended to recover the mutually independent sources from their linear mixtures, and F astICA is one of the most successful ICA algorithms. Although it seems reasonable to improve the performance of F astICA by introducing more nonlinear functions to the negentropy estimation, the original fixed-point method (approximate Newton method) in F astICA degenerates under this circumstance. To alleviate this problem, we propose a novel method based on the second-order approximation of minimum discrimination information (MDI). The joint maximization in our method is consisted of minimizing single weighted least squares and seeking unmixing matrix by the fixed-point method. Experimental results validate its efficiency compared with other popular ICA algorithms.

We present a novel class of projected methods, to perform statistical analysis on a data set of probability distributions on the real line, with the 2-Wasserstein metric. We focus in particular on Principal Component Analysis (PCA) and regression. To define these models, we exploit a representation of the Wasserstein space closely related to its weak Riemannian structure, by mapping the data to a suitable linear space and using a metric projection operator to constrain the results in the Wasserstein space. By carefully choosing the tangent point, we are able to derive fast empirical methods, exploiting a constrained B-spline approximation. As a byproduct of our approach, we are also able to derive faster routines for previous work on PCA for distributions. By means of simulation studies, we compare our approaches to previously proposed methods, showing that our projected PCA has similar performance for a fraction of the computational cost and that the projected regression is extremely flexible even under misspecification. Several theoretical properties of the models are investigated and asymptotic consistency is proven. Two real world applications to Covid-19 mortality in the US and wind speed forecasting are discussed.

We investigate a clustering problem with data from a mixture of Gaussians that share a common but unknown, and potentially ill-conditioned, covariance matrix. We start by considering Gaussian mixtures with two equally-sized components and derive a Max-Cut integer program based on maximum likelihood estimation. We prove its solutions achieve the optimal misclassification rate when the number of samples grows linearly in the dimension, up to a logarithmic factor. However, solving the Max-cut problem appears to be computationally intractable. To overcome this, we develop an efficient spectral algorithm that attains the optimal rate but requires a quadratic sample size. Although this sample complexity is worse than that of the Max-cut problem, we conjecture that no polynomial-time method can perform better. Furthermore, we gather numerical and theoretical evidence that supports the existence of a statistical-computational gap. Finally, we generalize the Max-Cut program to a $k$-means program that handles multi-component mixtures with possibly unequal weights. It enjoys similar optimality guarantees for mixtures of distributions that satisfy a transportation-cost inequality, encompassing Gaussian and strongly log-concave distributions.

Simulation-based inference with conditional neural density estimators is a powerful approach to solving inverse problems in science. However, these methods typically treat the underlying forward model as a black box, with no way to exploit geometric properties such as equivariances. Equivariances are common in scientific models, however integrating them directly into expressive inference networks (such as normalizing flows) is not straightforward. We here describe an alternative method to incorporate equivariances under joint transformations of parameters and data. Our method -- called group equivariant neural posterior estimation (GNPE) -- is based on self-consistently standardizing the "pose" of the data while estimating the posterior over parameters. It is architecture-independent, and applies both to exact and approximate equivariances. As a real-world application, we use GNPE for amortized inference of astrophysical binary black hole systems from gravitational-wave observations. We show that GNPE achieves state-of-the-art accuracy while reducing inference times by three orders of magnitude.

The Variational Auto-Encoder (VAE) is one of the most used unsupervised machine learning models. But although the default choice of a Gaussian distribution for both the prior and posterior represents a mathematically convenient distribution often leading to competitive results, we show that this parameterization fails to model data with a latent hyperspherical structure. To address this issue we propose using a von Mises-Fisher (vMF) distribution instead, leading to a hyperspherical latent space. Through a series of experiments we show how such a hyperspherical VAE, or $\mathcal{S}$-VAE, is more suitable for capturing data with a hyperspherical latent structure, while outperforming a normal, $\mathcal{N}$-VAE, in low dimensions on other data types.

This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data.

Yuankai Teng,Zhu Wang,Lili Ju,Anthony Gruber,Guannan Zhang
0+阅读 · 12月2日
Yoshinari Takeishi,Masazumi Iida,Jun'ichi Takeuchi
0+阅读 · 11月30日
Damek Davis,Mateo Díaz,Kaizheng Wang
0+阅读 · 11月29日
Maximilian Dax,Stephen R. Green,Jonathan Gair,Michael Deistler,Bernhard Schölkopf,Jakob H. Macke
0+阅读 · 11月25日
Tim R. Davidson,Luca Falorsi,Nicola De Cao,Thomas Kipf,Jakub M. Tomczak
3+阅读 · 2018年9月26日
Joel A. Tropp,Alp Yurtsever,Madeleine Udell,Volkan Cevher
4+阅读 · 2018年1月2日

11+阅读 · 2019年4月13日
CreateAMind
6+阅读 · 2019年1月18日
CreateAMind
7+阅读 · 2019年1月7日
CreateAMind
21+阅读 · 2019年1月4日
CreateAMind
8+阅读 · 2018年12月10日
CreateAMind
24+阅读 · 2018年9月12日
CreateAMind
3+阅读 · 2018年4月15日
CreateAMind
5+阅读 · 2017年8月4日
Top