Collaborative machine learning (ML), also known as federated ML, allows participants to jointly train a model without data sharing. To update the model parameters, the central parameter server broadcasts model parameters to the participants, and the participants send ascending directions such as gradients to the server. While data do not leave a participant's device, the communicated gradients and parameters will leak a participant's privacy. Prior work proposed attacks that infer participant's privacy from gradients and parameters, and they showed simple defenses like dropout and differential privacy do not help much. To defend privacy leakage, we propose a method called Double Blind Collaborative Learning (DBCL) which is based on random matrix sketching. The high-level idea is to apply a random transformation to the parameters, data, and gradients in every iteration so that the existing attacks will fail or become less effective. While it improves the security of collaborative ML, DBCL does not increase the computation and communication cost much and does not hurt prediction accuracy at all. DBCL can be potentially applied to decentralized collaborative ML to defend privacy leakage.

Machine learning methods have recently achieved high-performance in biomedical text analysis. However, a major bottleneck in the widespread application of these methods is obtaining the required large amounts of annotated training data, which is resource intensive and time consuming. Recent progress in self-supervised learning has shown promise in leveraging large text corpora without explicit annotations. In this work, we built a self-supervised contextual language representation model using BERT, a deep bidirectional transformer architecture, to identify radiology reports requiring prompt communication to the referring physicians. We pre-trained the BERT model on a large unlabeled corpus of radiology reports and used the resulting contextual representations in a final text classifier for communication urgency. Our model achieved a precision of 97.0%, recall of 93.3%, and F-measure of 95.1% on an independent test set in identifying radiology reports for prompt communication, and significantly outperformed the previous state-of-the-art model based on word2vec representations.

Recent work on deep neural network pruning has shown there exist sparse subnetworks that achieve equal or improved accuracy, training time, and loss using fewer network parameters when compared to their dense counterparts. Orthogonal to pruning literature, deep neural networks are known to be susceptible to adversarial examples, which may pose risks in security- or safety-critical applications. Intuition suggests that there is an inherent trade-off between sparsity and robustness such that these characteristics could not co-exist. We perform an extensive empirical evaluation and analysis testing the Lottery Ticket Hypothesis with adversarial training and show this approach enables us to find sparse, robust neural networks. Code for reproducing experiments is available here: https://github.com/justincosentino/robust-sparse-networks.

Machine learning has started to be deployed in fields such as healthcare and finance, which propelled the need for and growth of privacy-preserving machine learning (PPML). We propose an actively secure four-party protocol (4PC), and a framework for PPML, showcasing its applications on four of the most widely-known machine learning algorithms -- Linear Regression, Logistic Regression, Neural Networks, and Convolutional Neural Networks. Our 4PC protocol tolerating at most one malicious corruption is practically efficient as compared to the existing works. We use the protocol to build an efficient mixed-world framework (Trident) to switch between the Arithmetic, Boolean, and Garbled worlds. Our framework operates in the offline-online paradigm over rings and is instantiated in an outsourced setting for machine learning. Also, we propose conversions especially relevant to privacy-preserving machine learning. The highlights of our framework include using a minimal number of expensive circuits overall as compared to ABY3. This can be seen in our technique for truncation, which does not affect the online cost of multiplication and removes the need for any circuits in the offline phase. Our B2A conversion has an improvement of $\mathbf{7} \times$ in rounds and $\mathbf{18} \times$ in the communication complexity. In addition to these, all of the special conversions for machine learning, e.g. Secure Comparison, achieve constant round complexity. The practicality of our framework is argued through improvements in the benchmarking of the aforementioned algorithms when compared with ABY3. All the protocols are implemented over a 64-bit ring in both LAN and WAN settings. Our improvements go up to $\mathbf{187} \times$ for the training phase and $\mathbf{158} \times$ for the prediction phase when observed over LAN and WAN.

Environmental stresses such as drought and heat can cause substantial yield loss in agriculture. As such, hybrid crops that are tolerant to drought and heat stress would produce more consistent yields compared to the hybrids that are not tolerant to these stresses. In the 2019 Syngenta Crop Challenge, Syngenta released several large datasets that recorded the yield performances of 2,452 corn hybrids planted in 1,560 locations between 2008 and 2017 and asked participants to classify the corn hybrids as either tolerant or susceptible to drought stress, heat stress, and combined drought and heat stress. However, no data was provided that classified any set of hybrids as tolerant or susceptible to any type of stress. In this paper, we present an unsupervised approach to solving this problem, which was recognized as one of the winners in the 2019 Syngenta Crop Challenge. Our results labeled 121 hybrids as drought tolerant, 193 as heat tolerant, and 29 as tolerant to both stresses.

In this paper, we obtain fundamental $\mathcal{L}_{p}$ bounds in sequential prediction and recursive algorithms via an entropic analysis. Both classes of problems are examined by investigating the underlying entropic relationships of the data and/or noises involved, and the derived lower bounds may all be quantified in a conditional entropy characterization. We also study the conditions to achieve the generic bounds from an innovations' viewpoint.

Event ticket price prediction is important to marketing strategy for any sports team or musical ensemble. An accurate prediction model can help the marketing team to make promotion plan more effectively and efficiently. However, given all the historical transaction records, it is challenging to predict the sale price of the remaining seats at any future timestamp, not only because that the sale price is relevant to a lot of features (seat locations, date-to-event of the transaction, event date, team performance, etc.), but also because of the temporal and spatial sparsity in the dataset. For a game/concert, the ticket selling price of one seat is only observable once at the time of sale. Furthermore, some seats may not even be purchased (therefore no record available). In fact, data sparsity is commonly encountered in many prediction problems. Here, we propose a bi-level optimizing deep neural network to address the curse of spatio-temporal sparsity. Specifically, we introduce coarsening and refining layers, and design a bi-level loss function to integrate different level of loss for better prediction accuracy. Our model can discover the interrelations among ticket sale price, seat locations, selling time, event information, etc. Experiments show that our proposed model outperforms other benchmark methods in real-world ticket selling price prediction.

Robust machine learning is currently one of the most prominent topics which could potentially help shaping a future of advanced AI platforms that not only perform well in average cases but also in worst cases or adverse situations. Despite the long-term vision, however, existing studies on black-box adversarial attacks are still restricted to very specific settings of threat models (e.g., single distortion metric and restrictive assumption on target model's feedback to queries) and/or suffer from prohibitively high query complexity. To push for further advances in this field, we introduce a general framework based on an operator splitting method, the alternating direction method of multipliers (ADMM) to devise efficient, robust black-box attacks that work with various distortion metrics and feedback settings without incurring high query complexity. Due to the black-box nature of the threat model, the proposed ADMM solution framework is integrated with zeroth-order (ZO) optimization and Bayesian optimization (BO), and thus is applicable to the gradient-free regime. This results in two new black-box adversarial attack generation methods, ZO-ADMM and BO-ADMM. Our empirical evaluations on image classification datasets show that our proposed approaches have much lower function query complexities compared to state-of-the-art attack methods, but achieve very competitive attack success rates.

Convolutional neural network (CNN) and its variants have led to many state-of-art results in various fields. However, a clear theoretical understanding about them is still lacking. Recently, multi-layer convolutional sparse coding (ML-CSC) has been proposed and proved to equal such simply stacked networks (plain networks). Here, we think three factors in each layer of it including the initialization, the dictionary design and the number of iterations greatly affect the performance of ML-CSC. Inspired by these considerations, we propose two novel multi-layer models--residual convolutional sparse coding model (Res-CSC) and mixed-scale dense convolutional sparse coding model (MSD-CSC), which have close relationship with the residual neural network (ResNet) and mixed-scale (dilated) dense neural network (MSDNet), respectively. Mathematically, we derive the shortcut connection in ResNet as a special case of a new forward propagation rule on ML-CSC. We find a theoretical interpretation of the dilated convolution and dense connection in MSDNet by analyzing MSD-CSC, which gives a clear mathematical understanding about them. We implement the iterative soft thresholding algorithm (ISTA) and its fast version to solve Res-CSC and MSD-CSC, which can employ the unfolding operation for further improvements. At last, extensive numerical experiments and comparison with competing methods demonstrate their effectiveness using three typical datasets.

One of the impediments in advancing actuarial research and developing open source assets for insurance analytics is the lack of realistic publicly available datasets. In this work, we develop a workflow for synthesizing insurance datasets leveraging state-of-the-art neural network techniques. We evaluate the predictive modeling efficacy of datasets synthesized from publicly available data in the domains of general insurance pricing and life insurance shock lapse modeling. The trained synthesizers are able to capture representative characteristics of the real datasets. This workflow is implemented via an R interface to promote adoption by researchers and data owners.

Quality Diversity (QD) algorithms like Novelty Search with Local Competition (NSLC) and MAP-Elites are a new class of population-based stochastic algorithms designed to generate a diverse collection of quality solutions. Meanwhile, variants of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) are among the best-performing derivative-free optimizers in single-objective continuous domains. This paper proposes a new QD algorithm called Covariance Matrix Adaptation MAP-Elites (CMA-ME). Our new algorithm combines the dynamic self-adaptation techniques of CMA-ES with archiving and mapping techniques for maintaining diversity in QD. Results from experiments with standard continuous optimization benchmarks show that CMA-ME finds better-quality solutions than MAP-Elites; similarly, results on the strategic game Hearthstone show that CMA-ME finds both a higher overall quality and broader diversity of strategies than both CMA-ES and MAP-Elites. Overall, CMA-ME more than doubles the performance of MAP-Elites using standard QD performance metrics. These results suggest that QD algorithms augmented by operators from state-of-the-art optimization algorithms can yield high-performing methods for simultaneously exploring and optimizing continuous search spaces, with significant applications to design, testing, and reinforcement learning among other domains. Code is available for both the continuous optimization benchmark (https://github.com/tehqin/QualDivBenchmark) and Hearthstone (https://github.com/tehqin/EvoStone) domains.

Recently, neural vocoders have been widely used in speech synthesis tasks, including text-to-speech and voice conversion. However, in the encounter of data distribution mismatch between training and inference, neural vocoders trained on real data often degrade in voice quality for unseen scenarios. In this paper, we train three commonly used neural vocoders, including WaveNet, WaveRNN, and WaveGlow, alternately on five different datasets. To study the robustness of neural vocoders, we evaluate the models using acoustic features from seen/unseen speakers, seen/unseen languages, a text-to-speech model, and a voice conversion model. In this work, we found that WaveNet is more robust than WaveRNN, especially in the face of inconsistency between training and testing data. Through our experiments, we show that WaveNet is more suitable for text-to-speech models, and WaveRNN more suitable for voice conversion applications. Furthermore, we present results with considerable reference value of subjective human evaluation for future studies.

The ability to analyze and forecast stratospheric weather conditions is fundamental to addressing climate change. However, our capacity to collect data in the stratosphere is limited by sparsely deployed weather balloons. We propose a framework to collect stratospheric data by releasing a contrail of tiny sensor devices as a weather balloon ascends. The key machine learning challenges are determining when and how to deploy a finite collection of sensors to produce a useful data set. We decide when to release sensors by modeling the deviation of a forecast from actual stratospheric conditions as a Gaussian process. We then implement a novel hardware system that is capable of optimally releasing sensors from a rising weather balloon. We show that this data engineering framework is effective through real weather balloon flights, as well as simulations.

We recover a video of the motion taking place in a hidden scene by observing changes in indirect illumination in a nearby uncalibrated visible region. We solve this problem by factoring the observed video into a matrix product between the unknown hidden scene video and an unknown light transport matrix. This task is extremely ill-posed, as any non-negative factorization will satisfy the data. Inspired by recent work on the Deep Image Prior, we parameterize the factor matrices using randomly initialized convolutional neural networks trained in a one-off manner, and show that this results in decompositions that reflect the true motion in the hidden scene.

Top