Dropout is a simple but efficient regularization technique for achieving better generalization of deep neural networks (DNNs); hence it is widely used in tasks based on DNNs. During training, dropout randomly discards a portion of the neurons to avoid overfitting. This paper presents an enhanced dropout technique, which we call multi-sample dropout, for both accelerating training and improving generalization over the original dropout. The original dropout creates a randomly selected subset (called a dropout sample) from the input in each training iteration while the multi-sample dropout creates multiple dropout samples. The loss is calculated for each sample, and then the sample losses are averaged to obtain the final loss. This technique can be easily implemented without implementing a new operator by duplicating a part of the network after the dropout layer while sharing the weights among the duplicated fully connected layers. Experimental results showed that multi-sample dropout significantly accelerates training by reducing the number of iterations until convergence for image classification tasks using the ImageNet, CIFAR-10, CIFAR-100, and SVHN datasets. Multi-sample dropout does not significantly increase computation cost per iteration because most of the computation time is consumed in the convolution layers before the dropout layer, which are not duplicated. Experiments also showed that networks trained using multi-sample dropout achieved lower error rates and losses for both the training set and validation set.

Can prior network pruning strategies eliminate redundancy in multiple correlated pre-trained deep neural networks? It seems a positive answer if multiple networks are first combined and then pruned. However, we argue that an arbitrarily combined network may lead to sub-optimal pruning performance because their intra- and inter-redundancy may not be minimised at the same time while retaining the inference accuracy in each task. In this paper, we define and analyse the redundancy in multi-task networks from an information theoretic perspective, and identify challenges for existing pruning methods to function effectively for multi-task pruning. We propose Redundancy-Disentangled Networks (RDNets), which decouples intra- and inter-redundancy such that all redundancy can be suppressed via previous network pruning schemes. A pruned RDNet also ensures minimal computation in any subset of tasks, a desirable feature for selective task execution. Moreover, a heuristic is devised to construct an RDNet from multiple pre-trained networks. Experiments on CelebA show that the same pruning method on an RDNet achieves at least 1:8x lower memory usage and 1:4x lower computation cost than on a multi-task network constructed by the state-of-the-art network merging scheme.

Automatic generation of level maps is a popular form of automatic content generation. In this study, a recently developed technique employing the {\em do what's possible} representation is used to create open-ended level maps. Generation of the map can continue indefinitely, yielding a highly scalable representation. A parameter study is performed to find good parameters for the evolutionary algorithm used to locate high-quality map generators. Variations on the technique are presented, demonstrating its versatility, and an algorithmic variant is given that both improves performance and changes the character of maps located. The ability of the map to adapt to different regions where the map is permitted to occupy space are also tested.

With the widespread use of social media, companies now have access to a wealth of customer feedback data which has valuable applications to Customer Relationship Management (CRM). Analyzing customer grievances data, is paramount as their speedy non-redressal would lead to customer churn resulting in lower profitability. In this paper, we propose a descriptive analytics framework using Self-organizing feature map (SOM), for Visual Sentiment Analysis of customer complaints. The network learns the inherent grouping of the complaints automatically which can then be visualized too using various techniques. Analytical Customer Relationship Management (ACRM) executives can draw useful business insights from the maps and take timely remedial action. We also propose a high-performance version of the algorithm CUDASOM (CUDA based Self Organizing feature Map) implemented using NVIDIA parallel computing platform, CUDA, which speeds up the processing of high-dimensional text data and generates fast results. The efficacy of the proposed model has been demonstrated on the customer complaints data regarding the products and services of four leading Indian banks. CUDASOM achieved an average speed up of 44 times. Our approach can expand research into intelligent grievance redressal system to provide rapid solutions to the complaining customers.

In the present study, an amplifying neuron and attenuating neuron, which can be easily implemented into neural networks without any significant additional computational effort, are proposed. The activated output value is squared for the amplifying neuron, while the value becomes its reciprocal for the attenuating one. Theoretically, the order of neural networks increases when the amplifying neuron is placed in the hidden layer. The performance assessments of neural networks were conducted to verify that the amplifying and attenuating neurons enhance the performance of neural networks. From the numerical experiments, it was revealed that the neural networks that contain the amplifying and attenuating neurons yield more accurate results, compared to those without them.

Architectures obtained by Neural Architecture Search (NAS) have achieved highly competitive performance in various computer vision tasks. However, the prohibitive computation demand of forward-backward propagation in deep neural networks and searching algorithms makes it difficult to apply NAS in practice. In this paper, we propose a Multinomial Distribution Learning for extremely effective NAS, which considers the search space as a joint multinomial distribution, i.e., the operation between two nodes is sampled from this distribution, and the optimal network structure is obtained by the operations with the most likely probability in this distribution. Therefore, NAS can be transformed to a multinomial distribution learning problem, i.e., the distribution is optimized to have high expectation of the performance. Besides, a hypothesis that the performance ranking is consistent in every training epoch is proposed and demonstrated to further accelerate the learning process. Experiments on CIFAR-10 and ImageNet demonstrate the effectiveness of our method. On CIFAR-10, the structure searched by our method achieves 2.4\% test error, while being 6.0 $\times$ (only 4 GPU hours on GTX1080Ti) faster compared with state-of-the-art NAS algorithms. On ImageNet, our model achieves 75.2\% top-1 accuracy under MobileNet settings (MobileNet V1/V2), while being 1.2$\times$ faster with measured GPU latency. Test code is available at https://github.com/tanglang96/MDENAS

We introduce two approaches for combining neural evolution strategy (NES) and proximal policy optimization (PPO): parameter transfer and parameter space noise. Parameter transfer is a PPO agent with parameters transferred from a NES agent. Parameter space noise is to directly add noise to the PPO agents parameters. We demonstrate that PPO could benefit from both methods through experimental comparison on discrete action environments as well as continuous control tasks

Reading Comprehension has received significant attention in recent years as high quality Question Answering (QA) datasets have become available. Despite state-of-the-art methods achieving strong overall accuracy, Multi-Hop (MH) reasoning remains particularly challenging. To address MH-QA specifically, we propose a Deep Reinforcement Learning based method capable of learning sequential reasoning across large collections of documents so as to pass a query-aware, fixed-size context subset to existing models for answer extraction. Our method is comprised of two stages: a linker, which decomposes the provided support documents into a graph of sentences, and an extractor, which learns where to look based on the current question and already-visited sentences. The result of the linker is a novel graph structure at the sentence level that preserves logical flow while still allowing rapid movement between documents. Importantly, we demonstrate that the sparsity of the resultant graph is invariant to context size. This translates to fewer decisions required from the Deep-RL trained extractor, allowing the system to scale effectively to large collections of documents. The importance of sequential decision making in the document traversal step is demonstrated by comparison to standard IE methods, and we additionally introduce a BM25-based IR baseline that retrieves documents relevant to the query only. We examine the integration of our method with existing models on the recently proposed QAngaroo benchmark and achieve consistent increases in accuracy across the board, as well as a 2-3x reduction in training time.

Recently, autonomous driving development ignited competition among car makers and technical corporations. Low-level automation cars are already commercially available. But high automated vehicles where the vehicle drives by itself without human monitoring is still at infancy. Such autonomous vehicles (AVs) rely on the computing system in the car to to interpret the environment and make driving decisions. Therefore, computing system design is essential particularly in enhancing the attainment of driving safety. However, to our knowledge, no clear guideline exists so far regarding safety-aware AV computing system and architecture design. To understand the safety requirement of AV computing system, we performed a field study by running industrial Level-4 autonomous driving fleets in various locations, road conditions, and traffic patterns. The field study indicates that traditional computing system performance metrics, such as tail latency, average latency, maximum latency, and timeout, cannot fully satisfy the safety requirement for AV computing system design. To address this issue, we propose a safety score' as a primary metric for measuring the level of safety in AV computing system design. Furthermore, we propose a perception latency model, which helps architects estimate the safety score of given architecture and system design without physically testing them in an AV. We demonstrate the use of our safety score and latency model, by developing and evaluating a safety-aware AV computing system computation hardware resource management scheme.

Lexicase selection and novelty search, two parent selection methods used in evolutionary computation, emphasize exploring widely in the search space more than traditional methods such as tournament selection. However, lexicase selection is not explicitly driven to select for novelty in the population, and novelty search suffers from lack of direction toward a goal, especially in unconstrained, highly-dimensional spaces. We combine the strengths of lexicase selection and novelty search by creating a novelty score for each test case, and adding those novelty scores to the normal error values used in lexicase selection. We use this new novelty-lexicase selection to solve automatic program synthesis problems, and find it significantly outperforms both novelty search and lexicase selection. Additionally, we find that novelty search has very little success in the problem domain of program synthesis. We explore the effects of each of these methods on population diversity and long-term problem solving performance, and give evidence to support the hypothesis that novelty-lexicase selection resists converging to local optima better than lexicase selection.

Lexicase parent selection filters the population by considering one random training case at a time, eliminating any individuals with errors for the current case that are worse than the best error in the selection pool, until a single individual remains. This process often stops before considering all training cases, meaning that it will ignore the error values on any cases that were not yet considered. Lexicase selection can therefore select specialist individuals that have poor errors on some training cases, if they have great errors on others and those errors come near the start of the random list of cases used for the parent selection event in question. We hypothesize here that selecting these specialists, which may have poor total error, plays an important role in lexicase selection's observed performance advantages over error-aggregating parent selection methods such as tournament selection, which select specialists much less frequently. We conduct experiments examining this hypothesis, and find that lexicase selection's performance and diversity maintenance degrade when we deprive it of the ability of selecting specialists. These findings help explain the improved performance of lexicase selection compared to tournament selection, and suggest that specialists help drive evolution under lexicase selection toward global solutions.

We investigate prediction accuracy for time series of Echo state networks with respect to several kinds of activation functions. As a result, we found that some kinds of activation functions with an appropriate nonlinearity show high performance compared to the conventional sigmoid function.

Universal unitary photonic devices can apply arbitrary unitary transformations to a vector of input modes and provide a promising hardware platform for fast and energy-efficient machine learning using light. We simulate the gradient-based optimization of random unitary matrices on universal photonic devices composed of imperfect tunable interferometers. If device components are initialized uniform-randomly, the locally-interacting nature of the mesh components biases the optimization search space towards banded unitary matrices, limiting convergence to random unitary matrices. We detail a procedure for initializing the device by sampling from the distribution of random unitary matrices and show that this greatly improves convergence speed. We also explore mesh architecture improvements such as adding extra tunable beamsplitters or permuting waveguide layers to further improve the training speed and scalability of these devices.

Many experiments have been performed that use evolutionary algorithms for learning the topology and connection weights of a neural network that controls a robot or virtual agent. These experiments are not only performed to better understand basic biological principles, but also with the hope that with further progress of the methods, they will become competitive for automatically creating robot behaviors of interest. However, current methods are limited with respect to the (Kolmogorov) complexity of evolved behavior. Using the evolution of robot trajectories as an example, we show that by adding four features, namely (1) freezing of previously evolved structure, (2) temporal scaffolding, (3) a homogeneous transfer function for output nodes, and (4) mutations that create new pathways to outputs, to standard methods for the evolution of neural networks, we can achieve an approximately linear growth of the complexity of behavior over thousands of generations. Overall, evolved complexity is up to two orders of magnitude over that achieved by standard methods in the experiments reported here, with the major limiting factor for further growth being the available run time. Thus, the set of methods proposed here promises to be a useful addition to various current neuroevolution methods.

In this research, we compare four different evaluation methods in coevolution on the Majority Function problem. The size of the problem is selected such that evaluation against all possible test cases is feasible. Two measures are used for the comparisons, i.e., the objective fitness derived from evaluating solutions against all test cases, and the objective fitness correlation (OFC), which is defined as the correlation coefficient between subjective and objective fitness. The results of our experiments suggest that a combination of average score and weighted informativeness may provide a more accurate evaluation in coevolution. In order to confirm this difference, a series of t-tests on the preference between each pair of the evaluation methods is performed. The resulting significance is affirmative, and the tests for two quality measures show similar preference on four evaluation methods. %This study is the first time OFC is actually computed on a real problem. Experiments on Majority Function problems with larger sizes and Parity problems are in progress, and their results will be added in the final version.

Top