Generative adversarial networks (GANs) are a widely used framework for learning generative models. Wasserstein GANs (WGANs), one of the most successful variants of GANs, require solving a minmax optimization problem to global optimality, but are in practice successfully trained using stochastic gradient descent-ascent. In this paper, we show that, when the generator is a one-layer network, stochastic gradient descent-ascent converges to a global solution with polynomial time and sample complexity.

Given a sequence of sets, where each set contains an arbitrary number of elements, the problem of temporal sets prediction aims to predict the elements in the subsequent set. In practice, temporal sets prediction is much more complex than predictive modelling of temporal events and time series, and is still an open problem. Many possible existing methods, if adapted for the problem of temporal sets prediction, usually follow a two-step strategy by first projecting temporal sets into latent representations and then learning a predictive model with the latent representations. The two-step approach often leads to information loss and unsatisfactory prediction performance. In this paper, we propose an integrated solution based on the deep neural networks for temporal sets prediction. A unique perspective of our approach is to learn element relationship by constructing set-level co-occurrence graph and then perform graph convolutions on the dynamic relationship graphs. Moreover, we design an attention-based module to adaptively learn the temporal dependency of elements and sets. Finally, we provide a gated updating mechanism to find the hidden shared patterns in different sequences and fuse both static and dynamic information to improve the prediction performance. Experiments on real-world data sets demonstrate that our approach can achieve competitive performances even with a portion of the training data and can outperform existing methods with a significant margin.

Empirical risk minimization (ERM) is typically designed to perform well on the average loss, which can result in estimators that are sensitive to outliers, generalize poorly, or treat subgroups unfairly. While many methods aim to address these problems individually, in this work, we explore them through a unified framework---tilted empirical risk minimization (TERM). In particular, we show that it is possible to flexibly tune the impact of individual losses through a straightforward extension to ERM using a hyperparameter called the tilt. We provide several interpretations of the resulting framework: We show that TERM can increase or decrease the influence of outliers, respectively, to enable fairness or robustness; has variance-reduction properties that can benefit generalization; and can be viewed as a smooth approximation to a superquantile method. We develop batch and stochastic first-order optimization methods for solving TERM, and show that the problem can be efficiently solved relative to common alternatives. Finally, we demonstrate that TERM can be used for a multitude of applications, such as enforcing fairness between subgroups, mitigating the effect of outliers, and handling class imbalance. TERM is not only competitive with existing solutions tailored to these individual problems, but can also enable entirely new applications, such as simultaneously addressing outliers and promoting fairness.

We comprehensively reveal the learning dynamics of deep neural networks (DNN) with batch normalization (BN) and weight decay (WD), named as Spherical Motion Dynamics (SMD). Our theorem on SMD is based on the scale-invariant property of weights caused by BN, and regularization effect of WD. SMD shows the optimization trajectory of weights is like a spherical motion; and a new indicator, angular update is proposed to measure the update efficiency of DNN with BN and WD. We rigorously prove that the angular update is only determined by pre-defined hyper-parameters (i.e. learning rate, WD parameter and momentum coefficient), and provide their quantitative relationship. Most importantly, the quantitative result of SMD can perfectly match the empirical observation in complex and large scale computer vision tasks like ImageNet and COCO with standard training schemes. SMD can also yield reasonable interpretations on some phenomena about BN from an entirely new perspective, including avoidance of vanishing and exploding gradient, no risk of being trapped into sharp minima, and sudden drop of loss when shrinking learning rate. Further, to present the practical significance of SMD, we discuss the connection between SMD and commonly used learning rate tuning scheme: Linear Scaling Principle.

Recent advances of network architecture for point cloud processing are mainly driven by new designs of local aggregation operators. However, the impact of these operators to network performance is not carefully investigated due to different overall network architecture and implementation details in each solution. Meanwhile, most of operators are only applied in shallow architectures. In this paper, we revisit the representative local aggregation operators and study their performance using the same deep residual architecture. Our investigation reveals that despite the different designs of these operators, all of these operators make surprisingly similar contributions to the network performance under the same network input and feature numbers and result in the state-of-the-art accuracy on standard benchmarks. This finding stimulate us to rethink the necessity of sophisticated design of local aggregation operator for point cloud processing. To this end, we propose a simple local aggregation operator without learnable weights, named Position Pooling (PosPool), which performs similarly or slightly better than existing sophisticated operators. In particular, a simple deep residual network with PosPool layers achieves outstanding performance on all benchmarks, which outperforms the previous state-of-the methods on the challenging PartNet datasets by a large margin (7.4 mIoU). The code is publicly available at https://github.com/zeliu98/CloserLook3D

Unsteady fluid systems are nonlinear high-dimensional dynamical systems that may exhibit multiple complex phenomena both in time and space. Reduced Order Modeling (ROM) of fluid flows has been an active research topic in the recent decade with the primary goal to decompose complex flows to a set of features most important for future state prediction and control, typically using a dimensionality reduction technique. In this work, a novel data-driven technique based on the power of deep neural networks for reduced order modeling of the unsteady fluid flows is introduced. An autoencoder network is used for nonlinear dimension reduction and feature extraction as an alternative for singular value decomposition (SVD). Then, the extracted features are used as an input for long short-term memory network (LSTM) to predict the velocity field at future time instances. The proposed autoencoder-LSTM method is compared with dynamic mode decomposition (DMD) as the data-driven base method. Moreover, an autoencoder-DMD algorithm is introduced for reduced order modeling, which uses the autoencoder network for dimensionality reduction rather than SVD rank truncation. Results show that the autoencoder-LSTM method is considerably capable of predicting the fluid flow evolution, where higher values for coefficient of determination $R^{2}$ are obtained using autoencoder-LSTM comparing to DMD.

Deep neural networks are being increasingly used in real world applications (e.g. surveillance, face recognition). This has resulted in concerns about the fairness of decisions made by these models. Various notions and measures of fairness have been proposed to ensure that a decision-making system does not disproportionately harm (or benefit) particular subgroups of population. In this paper, we argue that traditional notions of fairness that are only based on models' outputs are not sufficient when decision-making systems such as deep networks are vulnerable to adversarial attacks. We argue that in some cases, it may be easier for an attacker to target a particular subgroup, resulting in a form of \textit{robustness bias}. We propose a new notion of \textit{adversarial fairness} that requires all subgroups to be equally robust to adversarial perturbations. We show that state-of-the-art neural networks can exhibit robustness bias on real world datasets such as CIFAR10, CIFAR100, Adience, and UTKFace. We then formulate a measure of our proposed fairness notion and use it as a regularization term to decrease the robustness bias in the traditional empirical risk minimization objective. Through empirical evidence, we show that training with our proposed regularization term can partially mitigate adversarial unfairness while maintaining reasonable classification accuracy.

While current research has shown the importance of Multi-parametric MRI (mpMRI) in diagnosing prostate cancer (PCa), further investigation is needed for how to incorporate the specific structures of the mpMRI data, such as the regional heterogeneity and between-voxel correlation within a subject. This paper proposes a machine learning-based method for improved voxel-wise PCa classification by taking into account the unique structures of the data. We propose a multi-resolution modeling approach to account for regional heterogeneity, where base learners trained locally at multiple resolutions are combined using the super learner, and account for between-voxel correlation by efficient spatial Gaussian kernel smoothing. The method is flexible in that the super learner framework allows implementation of any classifier as the base learner, and can be easily extended to classifying cancer into more sub-categories. We describe detailed classification algorithm for the binary PCa status, as well as the ordinal clinical significance of PCa for which a weighted likelihood approach is implemented to enhance the detection of the less prevalent cancer categories. We illustrate the advantages of the proposed approach over conventional modeling and machine learning approaches through simulations and application to in vivo data.

Neural network architectures are most often conceptually designed and described in visual terms, but are implemented by writing error-prone code. PrototypeML is a machine learning development environment that bridges the dichotomy between the design and development processes: it provides a highly intuitive visual neural network design interface that supports (yet abstracts) the full capabilities of the PyTorch deep learning framework, reduces model design and development time, makes debugging easier, and automates many framework and code writing idiosyncrasies. In this paper, we detail the deep learning development deficiencies that drove the implementation of PrototypeML, and propose a hybrid approach to resolve these issues without limiting network expressiveness or reducing code quality. We demonstrate the real-world benefits of a visual approach to neural network design for research, industry and teaching. Available at https://PrototypeML.com

Stochastic processes provide a mathematically elegant way model complex data. In theory, they provide flexible priors over function classes that can encode a wide range of interesting assumptions. In practice, however, efficient inference by optimisation or marginalisation is difficult, a problem further exacerbated with big data and high dimensional input spaces. We propose a novel variational autoencoder (VAE) called the prior encoding variational autoencoder ($\pi$VAE). The $\pi$VAE is finitely exchangeable and Kolmogorov consistent, and thus is a continuous stochastic process. We use $\pi$VAE to learn low dimensional embeddings of function classes. We show that our framework can accurately learn expressive function classes such as Gaussian processes, but also properties of functions to enable statistical inference (such as the integral of a log Gaussian process). For popular tasks, such as spatial interpolation, $\pi$VAE achieves state-of-the-art performance both in terms of accuracy and computational efficiency. Perhaps most usefully, we demonstrate that the low dimensional independently distributed latent space representation learnt provides an elegant and scalable means of performing Bayesian inference for stochastic processes within probabilistic programming languages such as Stan.