Machine learning practitioners invest significant manual and computational resources in finding suitable learning rates for optimization algorithms. We provide a probabilistic motivation, in terms of Gaussian inference, for popular stochastic first-order methods. As an important special case, it recovers the Polyak step with a general metric. The inference allows us to relate the learning rate to a dimensionless quantity that can be automatically adapted during training by a control algorithm. The resulting meta-algorithm is shown to adapt learning rates in a robust manner across a large range of initial values when applied to deep learning benchmark problems.

0
下载
关闭预览

相关内容

梯度下降法算法用梯度乘以一个称为学习速率(有时也称为步长)的标量,以确定下一个点的位置。如果学习速率太小,则会使收敛过慢,如果学习速率太大,则会导致代价函数振荡。

We consider the problem of contextual bandits with stochastic experts, which is a variation of the traditional stochastic contextual bandit with experts problem. In our problem setting, we assume access to a class of stochastic experts, where each expert is a conditional distribution over the arms given a context. We propose upper-confidence bound (UCB) algorithms for this problem, which employ two different importance sampling based estimators for the mean reward for each expert. Both these estimators leverage information leakage among the experts, thus using samples collected under all the experts to estimate the mean reward of any given expert. This leads to instance dependent regret bounds of $\mathcal{O}\left(\lambda(\pmb{\mu})\mathcal{M}\log T/\Delta \right)$, where $\lambda(\pmb{\mu})$ is a term that depends on the mean rewards of the experts, $\Delta$ is the smallest gap between the mean reward of the optimal expert and the rest, and $\mathcal{M}$ quantifies the information leakage among the experts. We show that under some assumptions $\lambda(\pmb{\mu})$ is typically $\mathcal{O}(\log N)$, where $N$ is the number of experts. We implement our algorithm with stochastic experts generated from cost-sensitive classification oracles and show superior empirical performance on real-world datasets, when compared to other state of the art contextual bandit algorithms.

0
0
下载
预览

Metric clustering is fundamental in areas ranging from Combinatorial Optimization and Data Mining, to Machine Learning and Operations Research. However, in a variety of situations we may have additional requirements or knowledge, distinct from the underlying metric, regarding which pairs of points should be clustered together. To capture and analyze such scenarios, we introduce a novel family of \emph{stochastic pairwise constraints}, which we incorporate into several essential clustering objectives (radius/median/means). Moreover, we demonstrate that these constraints can succinctly model an intriguing collection of applications, including among others \emph{Individual Fairness} in clustering and \emph{Must-link} constraints in semi-supervised learning. Our main result consists of a general framework that yields approximation algorithms with provable guarantees for important clustering objectives, while at the same time producing solutions that respect the stochastic pairwise constraints. Furthermore, for certain objectives we devise improved results in the case of Must-link constraints, which are also the best possible from a theoretical perspective. Finally, we present experimental evidence that validates the effectiveness of our algorithms.

0
0
下载
预览

Existing deterministic variational inference approaches for diffusion processes use simple proposals and target the marginal density of the posterior. We construct the variational process as a controlled version of the prior process and approximate the posterior by a set of moment functions. In combination with moment closure, the smoothing problem is reduced to a deterministic optimal control problem. Exploiting the path-wise Fisher information, we propose an optimization procedure that corresponds to a natural gradient descent in the variational parameters. Our approach allows for richer variational approximations that extend to state-dependent diffusion terms. The classical Gaussian process approximation is recovered as a special case.

0
0
下载
预览

We study linear stability of solutions to the Navier\textendash Stokes equations with stochastic viscosity. Specifically, we assume that the viscosity is given in the form of a~stochastic expansion. Stability analysis requires a solution of the steady-state Navier-Stokes equation and then leads to a generalized eigenvalue problem, from which we wish to characterize the real part of the rightmost eigenvalue. While this can be achieved by Monte Carlo simulation, due to its computational cost we study three surrogates based on generalized polynomial chaos, Gaussian process regression and a shallow neural network. The results of linear stability analysis assessment obtained by the surrogates are compared to that of Monte Carlo simulation using a set of numerical experiments.

0
0
下载
预览

The global optimization of a high-dimensional black-box function under black-box constraints is a pervasive task in machine learning, control, and engineering. These problems are challenging since the feasible set is typically non-convex and hard to find, in addition to the curses of dimensionality and the heterogeneity of the underlying functions. In particular, these characteristics dramatically impact the performance of Bayesian optimization methods, that otherwise have become the de facto standard for sample-efficient optimization in unconstrained settings, leaving practitioners with evolutionary strategies or heuristics. We propose the scalable constrained Bayesian optimization (SCBO) algorithm that overcomes the above challenges and pushes the applicability of Bayesian optimization far beyond the state-of-the-art. A comprehensive experimental evaluation demonstrates that SCBO achieves excellent results on a variety of benchmarks. To this end, we propose two new control problems that we expect to be of independent value for the scientific community.

0
0
下载
预览

K-Means is one of the most used algorithms for data clustering and the usual clustering method for benchmarking. Despite its wide application it is well-known that it suffers from a series of disadvantages; it is only able to find local minima and the positions of the initial clustering centres (centroids) can greatly affect the clustering solution. Over the years many K-Means variations and initialisation techniques have been proposed with different degrees of complexity. In this study we focus on common K-Means variations along with a range of deterministic and stochastic initialisation techniques. We show that, on average, more sophisticated initialisation techniques alleviate the need for complex clustering methods. Furthermore, deterministic methods perform better than stochastic methods. However, there is a trade-off: less sophisticated stochastic methods, executed multiple times, can result in better clustering. Factoring in execution time, deterministic methods can be competitive and result in a good clustering solution. These conclusions are obtained through extensive benchmarking using a range of synthetic model generators and real-world data sets.

0
0
下载
预览

Distance metric learning based on triplet loss has been applied with success in a wide range of applications such as face recognition, image retrieval, speaker change detection and recently recommendation with the CML model. However, as we show in this article, CML requires large batches to work reasonably well because of a too simplistic uniform negative sampling strategy for selecting triplets. Due to memory limitations, this makes it difficult to scale in high-dimensional scenarios. To alleviate this problem, we propose here a 2-stage negative sampling strategy which finds triplets that are highly informative for learning. Our strategy allows CML to work effectively in terms of accuracy and popularity bias, even when the batch size is an order of magnitude smaller than what would be needed with the default uniform sampling. We demonstrate the suitability of the proposed strategy for recommendation and exhibit consistent positive results across various datasets.

0
3
下载
预览

UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data. The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Furthermore, UMAP has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning.

0
7
下载
预览

We propose accelerated randomized coordinate descent algorithms for stochastic optimization and online learning. Our algorithms have significantly less per-iteration complexity than the known accelerated gradient algorithms. The proposed algorithms for online learning have better regret performance than the known randomized online coordinate descent algorithms. Furthermore, the proposed algorithms for stochastic optimization exhibit as good convergence rates as the best known randomized coordinate descent algorithms. We also show simulation results to demonstrate performance of the proposed algorithms.

0
7
下载
预览

Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space, such as the simplex, the time-discretisation error can dominate when we are near the boundary of the space. We demonstrate that while current SGMCMC methods for the simplex perform well in certain cases, they struggle with sparse simplex spaces; when many of the components are close to zero. However, most popular large-scale applications of Bayesian inference on simplex spaces, such as network or topic models, are sparse. We argue that this poor performance is due to the biases of SGMCMC caused by the discretization error. To get around this, we propose the stochastic CIR process, which removes all discretization error and we prove that samples from the stochastic CIR process are asymptotically unbiased. Use of the stochastic CIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches.

0
3
下载
预览
小贴士
相关论文
Rajat Sen,Karthikeyan Shanmugam,Nihal Sharma,Sanjay Shakkottai
0+阅读 · 3月3日
Brian Brubach,Darshan Chakrabarti,John P. Dickerson,Aravind Srinivasan,Leonidas Tsepenekas
0+阅读 · 3月2日
Christian Wildner,Heinz Koeppl
0+阅读 · 3月1日
Bedřich Sousedík,Howard C. Elman,Kookjin Lee,Randy Price
0+阅读 · 2月28日
David Eriksson,Matthias Poloczek
0+阅读 · 2月28日
Avgoustinos Vouros,Stephen Langdell,Mike Croucher,Eleni Vasilaki
0+阅读 · 2月27日
Viet-Anh Tran,Romain Hennequin,Jimena Royo-Letelier,Manuel Moussallam
3+阅读 · 2019年9月24日
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
Leland McInnes,John Healy,James Melville
7+阅读 · 2018年12月6日
Jack Baker,Paul Fearnhead,Emily B Fox,Christopher Nemeth
3+阅读 · 2018年6月19日
相关VIP内容
专知会员服务
66+阅读 · 2020年2月1日
相关资讯
元学习(Meta Learning)最全论文、视频、书籍资源整理
深度学习与NLP
14+阅读 · 2019年6月20日
强化学习的Unsupervised Meta-Learning
CreateAMind
6+阅读 · 2019年1月7日
Unsupervised Learning via Meta-Learning
CreateAMind
26+阅读 · 2019年1月3日
Disentangled的假设的探讨
CreateAMind
7+阅读 · 2018年12月10日
Hierarchical Disentangled Representations
CreateAMind
3+阅读 · 2018年4月15日
【论文】变分推断(Variational inference)的总结
机器学习研究会
22+阅读 · 2017年11月16日
Adversarial Variational Bayes: Unifying VAE and GAN 代码
CreateAMind
7+阅读 · 2017年10月4日
Auto-Encoding GAN
CreateAMind
5+阅读 · 2017年8月4日
Top