Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the local geometry and update iteratively. Even though solving non-convex functions is NP-hard in the worst case, the optimization quality in practice is often not an issue -- optimizers are largely believed to find approximate global minima. Researchers hypothesize a unified explanation for this intriguing phenomenon: most of the local minima of the practically-used objectives are approximately global minima. We rigorously formalize it for concrete instances of machine learning problems.

6
下载
关闭预览

相关内容

最优化是应用数学的一个分支,主要指在一定条件限制下,选取某种研究方案使目标达到最优的一种方法。最优化问题在当今的军事、工程、管理等领域有着极其广泛的应用。

An attention matrix of a transformer self-attention sublayer can provably be decomposed into two components and only one of them (effective attention) contributes to the model output. This leads us to ask whether visualizing effective attention gives different conclusions than interpretation of standard attention. Using a subset of the GLUE tasks and BERT, we carry out an analysis to compare the two attention matrices, and show that their interpretations differ. Effective attention is less associated with the features related to the language modeling pretraining such as the separator token, and it has more potential to illustrate linguistic features captured by the model for solving the end-task. Given the found differences, we recommend using effective attention for studying a transformer's behavior since it is more pertinent to the model output by design.

0
0
下载
预览

The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.

0
14
下载
预览

A key challenge of big data analytics is how to collect a large volume of (labeled) data. Crowdsourcing aims to address this challenge via aggregating and estimating high-quality data (e.g., sentiment label for text) from pervasive clients/users. Existing studies on crowdsourcing focus on designing new methods to improve the aggregated data quality from unreliable/noisy clients. However, the security aspects of such crowdsourcing systems remain under-explored to date. We aim to bridge this gap in this work. Specifically, we show that crowdsourcing is vulnerable to data poisoning attacks, in which malicious clients provide carefully crafted data to corrupt the aggregated data. We formulate our proposed data poisoning attacks as an optimization problem that maximizes the error of the aggregated data. Our evaluation results on one synthetic and two real-world benchmark datasets demonstrate that the proposed attacks can substantially increase the estimation errors of the aggregated data. We also propose two defenses to reduce the impact of malicious clients. Our empirical results show that the proposed defenses can substantially reduce the estimation errors of the data poisoning attacks.

0
4
下载
预览

Deep neural networks can achieve great successes when presented with large data sets and sufficient computational resources. However, their ability to learn new concepts quickly is quite limited. Meta-learning is one approach to address this issue, by enabling the network to learn how to learn. The exciting field of Deep Meta-Learning advances at great speed, but lacks a unified, insightful overview of current techniques. This work presents just that. After providing the reader with a theoretical foundation, we investigate and summarize key methods, which are categorized into i) metric-, ii) model-, and iii) optimization-based techniques. In addition, we identify the main open challenges, such as performance evaluations on heterogeneous benchmarks, and reduction of the computational costs of meta-learning.

0
4
下载
预览

This work considers the question of how convenient access to copious data impacts our ability to learn causal effects and relations. In what ways is learning causality in the era of big data different from -- or the same as -- the traditional one? To answer this question, this survey provides a comprehensive and structured review of both traditional and frontier methods in learning causality and relations along with the connections between causality and machine learning. This work points out on a case-by-case basis how big data facilitates, complicates, or motivates each approach.

0
21
下载
预览

The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often refereed to as aleatoric and epistemic. In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of hitherto attempts at handling uncertainty in general and formalizing this distinction in particular.

0
8
下载
预览

When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.

0
79
下载
预览

Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.

0
16
下载
预览

In this paper, from a theoretical perspective, we study how powerful graph neural networks (GNNs) can be for learning approximation algorithms for combinatorial problems. To this end, we first establish a new class of GNNs that can solve strictly a wider variety of problems than existing GNNs. Then, we bridge the gap between GNN theory and the theory of distributed local algorithms to theoretically demonstrate that the most powerful GNN can learn approximation algorithms for the minimum dominating set problem and the minimum vertex cover problem with some approximation ratios and that no GNN can perform better than with these ratios. This paper is the first to elucidate approximation ratios of GNNs for combinatorial problems. Furthermore, we prove that adding coloring or weak-coloring to each node feature improves these approximation ratios. This indicates that preprocessing and feature engineering theoretically strengthen model capabilities.

0
6
下载
预览

Lots of learning tasks require dealing with graph data which contains rich relation information among elements. Modeling physics system, learning molecular fingerprints, predicting protein interface, and classifying diseases require that a model to learn from graph inputs. In other domains such as learning from non-structural data like texts and images, reasoning on extracted structures, like the dependency tree of sentences and the scene graph of images, is an important research topic which also needs graph reasoning models. Graph neural networks (GNNs) are connectionist models that capture the dependence of graphs via message passing between the nodes of graphs. Unlike standard neural networks, graph neural networks retain a state that can represent information from its neighborhood with an arbitrary depth. Although the primitive graph neural networks have been found difficult to train for a fixed point, recent advances in network architectures, optimization techniques, and parallel computation have enabled successful learning with them. In recent years, systems based on graph convolutional network (GCN) and gated graph neural network (GGNN) have demonstrated ground-breaking performance on many tasks mentioned above. In this survey, we provide a detailed review over existing graph neural network models, systematically categorize the applications, and propose four open problems for future research.

0
67
下载
预览
小贴士
相关论文
Kaiser Sun,Ana Marasović
0+阅读 · 5月18日
Peter L. Bartlett,Andrea Montanari,Alexander Rakhlin
14+阅读 · 3月16日
Minghong Fang,Minghao Sun,Qi Li,Neil Zhenqiang Gong,Jin Tian,Jia Liu
4+阅读 · 2月18日
Mike Huisman,Jan N. van Rijn,Aske Plaat
4+阅读 · 2020年10月7日
Ruocheng Guo,Lu Cheng,Jundong Li,P. Richard Hahn,Huan Liu
21+阅读 · 2020年5月5日
Optimization for deep learning: theory and algorithms
Ruoyu Sun
79+阅读 · 2019年12月19日
Advances and Open Problems in Federated Learning
Peter Kairouz,H. Brendan McMahan,Brendan Avent,Aurélien Bellet,Mehdi Bennis,Arjun Nitin Bhagoji,Keith Bonawitz,Zachary Charles,Graham Cormode,Rachel Cummings,Rafael G. L. D'Oliveira,Salim El Rouayheb,David Evans,Josh Gardner,Zachary Garrett,Adrià Gascón,Badih Ghazi,Phillip B. Gibbons,Marco Gruteser,Zaid Harchaoui,Chaoyang He,Lie He,Zhouyuan Huo,Ben Hutchinson,Justin Hsu,Martin Jaggi,Tara Javidi,Gauri Joshi,Mikhail Khodak,Jakub Konečný,Aleksandra Korolova,Farinaz Koushanfar,Sanmi Koyejo,Tancrède Lepoint,Yang Liu,Prateek Mittal,Mehryar Mohri,Richard Nock,Ayfer Özgür,Rasmus Pagh,Mariana Raykova,Hang Qi,Daniel Ramage,Ramesh Raskar,Dawn Song,Weikang Song,Sebastian U. Stich,Ziteng Sun,Ananda Theertha Suresh,Florian Tramèr,Praneeth Vepakomma,Jianyu Wang,Li Xiong,Zheng Xu,Qiang Yang,Felix X. Yu,Han Yu,Sen Zhao
16+阅读 · 2019年12月10日
Ryoma Sato,Makoto Yamada,Hisashi Kashima
6+阅读 · 2019年5月24日
Graph Neural Networks: A Review of Methods and Applications
Jie Zhou,Ganqu Cui,Zhengyan Zhang,Cheng Yang,Zhiyuan Liu,Maosong Sun
67+阅读 · 2018年12月20日
相关资讯
LibRec 精选:AutoML for Contextual Bandits
LibRec智能推荐
6+阅读 · 2019年9月19日
强化学习的Unsupervised Meta-Learning
CreateAMind
7+阅读 · 2019年1月7日
Unsupervised Learning via Meta-Learning
CreateAMind
28+阅读 · 2019年1月3日
meta learning 17年:MAML SNAIL
CreateAMind
9+阅读 · 2019年1月2日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
10+阅读 · 2018年12月24日
Disentangled的假设的探讨
CreateAMind
7+阅读 · 2018年12月10日
【SIGIR2018】五篇对抗训练文章
专知
11+阅读 · 2018年7月9日
神经网络学习率设置
机器学习研究会
4+阅读 · 2018年3月3日
NIPS 2017:贝叶斯深度学习与深度贝叶斯学习(讲义+视频)
机器学习研究会
29+阅读 · 2017年12月10日
Capsule Networks解析
机器学习研究会
10+阅读 · 2017年11月12日
Top