The way how recurrently connected networks of spiking neurons in the brain acquire powerful information processing capabilities through learning has remained a mystery. This lack of understanding is linked to a lack of learning algorithms for recurrent networks of spiking neurons (RSNNs) that are both functionally powerful and can be implemented by known biological mechanisms. Since RSNNs are simultaneously a primary target for implementations of brain-inspired circuits in neuromorphic hardware, this lack of algorithmic insight also hinders technological progress in that area. The gold standard for learning in recurrent neural networks in machine learning is back-propagation through time (BPTT), which implements stochastic gradient descent with regard to a given loss function. But BPTT is unrealistic from a biological perspective, since it requires a transmission of error signals backwards in time and in space, i.e., from post- to presynaptic neurons. We show that an online merging of locally available information during a computation with suitable top-down learning signals in real-time provides highly capable approximations to BPTT. For tasks where information on errors arises only late during a network computation, we enrich locally available information through feedforward eligibility traces of synapses that can easily be computed in an online manner. The resulting new generation of learning algorithms for recurrent neural networks provides a new understanding of network learning in the brain that can be tested experimentally. In addition, these algorithms provide efficient methods for on-chip training of RSNNs in neuromorphic hardware.
This paper presents Tofu, a system that partitions very large DNN models across multiple GPU devices to reduce per-GPU memory footprint. Tofu is designed to partition a dataflow graph of fine-grained tensor operators in order to work transparently with a general-purpose deep learning platform like MXNet. In order to automatically partition each operator, we propose to describe the semantics of an operator in a simple language which represents tensors as lambda functions mapping from tensor coordinates to values. To optimally partition different operators in a dataflow graph, Tofu uses a recursive search algorithm that minimizes the total communication cost. Our experiments on an 8-GPU machine show that Tofu enables the training of very large CNN and RNN models. It also achieves 25% - 400% speedup over alternative approaches to train very large models.
Scientific documents rely on both mathematics and text to communicate ideas. Inspired by the topical correspondence between mathematical equations and word contexts observed in scientific texts, we propose a novel topic model that jointly generates mathematical equations and their surrounding text (TopicEq). Using an extension of the correlated topic model, the context is generated from a mixture of latent topics, and the equation is generated by an RNN that depends on the latent topic activations. To experiment with this model, we create a corpus of 400K equation-context pairs extracted from a range of scientific articles from arXiv, and fit the model using a variational autoencoder approach. Experimental results show that this joint model significantly outperforms existing topic models and equation models for scientific texts. Moreover, we qualitatively show that the model effectively captures the relationship between topics and mathematics, enabling novel applications such as topic-aware equation generation, equation topic inference, and topic-aware alignment of mathematical symbols and words.
Filtering point targets in highly cluttered and noisy data frames can be very challenging, especially for complex target motions. Fixed motion models can fail to provide accurate predictions, while learning based algorithm can be difficult to design (due to the variable number of targets), slow to train and dependent on separate train/test steps. To address these issues, this paper proposes a multi-target filtering algorithm which learns the motion models, on the fly, using a recurrent neural network with a long short-term memory architecture, as a regression block. The target state predictions are then corrected using a novel data association algorithm, with a low computational complexity. The proposed algorithm is evaluated over synthetic and real point target filtering scenarios, demonstrating a remarkable performance over highly cluttered data sequences.
This paper is concerned with the training of recurrent neural networks as goal-oriented dialog agents using reinforcement learning. Training such agents with policy gradients typically requires a large amount of samples. However, the collection of the required data in form of conversations between chat-bots and human agents is time-consuming and expensive. To mitigate this problem, we describe an efficient policy gradient method using positive memory retention, which significantly increases the sample-efficiency. We show that our method is 10 times more sample-efficient than policy gradients in extensive experiments on a new synthetic number guessing game. Moreover, in a real-word visual object discovery game, the proposed method is twice as sample-efficient as policy gradients and shows state-of-the-art performance.
Human motion prediction from motion capture data is a classical problem in the computer vision, and conventional methods take the holistic human body as input. These methods ignore the fact that, in various human activities, different body components (limbs and the torso) have distinctive characteristics in terms of the moving pattern. In this paper, we argue local representations on different body components should be learned separately and, based on such idea, propose a network, Skeleton Network (SkelNet), for long-term human motion prediction. Specifically, at each time-step, local structure representations of input (human body) are obtained via SkelNet's branches of component-specific layers, then the shared layer uses local spatial representations to predict the future human pose. Our SkelNet is the first to use local structure representations for predicting the human motion. Then, for short-term human motion prediction, we propose the second network, named as Skeleton Temporal Network (Skel-TNet). Skel-TNet consists of three components: SkelNet and a Recurrent Neural Network, they have advantages in learning spatial and temporal dependencies for predicting human motion, respectively; a feed-forward network that outputs the final estimation. Our methods achieve promising results on the Human3.6M dataset and the CMU motion capture dataset.
A major tenet in theoretical neuroscience is that cognitive and behavioral processes are ultimately implemented in terms of the neural system dynamics. Accordingly, a major aim for the analysis of neurophysiological measurements should lie in the identification of the computational dynamics underlying task processing. Here we advance a state space model (SSM) based on generative piecewise-linear recurrent neural networks (PLRNN) to assess dynamics from neuroimaging data. In contrast to many other nonlinear time series models which have been proposed for reconstructing latent dynamics, our model is easily interpretable in neural terms, amenable to systematic dynamical systems analysis of the resulting set of equations, and can straightforwardly be transformed into an equivalent continuous-time dynamical system. The major contributions of this paper are the introduction of a new observation model suitable for functional magnetic resonance imaging (fMRI) coupled to the latent PLRNN, an efficient stepwise training procedure that forces the latent model to capture the 'true' underlying dynamics rather than just fitting (or predicting) the observations, and of an empirical measure based on the Kullback-Leibler divergence to evaluate from empirical time series how well this goal of approximating the underlying dynamics has been achieved. We validate and illustrate the power of our approach on simulated 'ground-truth' dynamical (benchmark) systems as well as on actual experimental fMRI time series. Given that fMRI is one of the most common techniques for measuring brain activity non-invasively in human subjects, this approach may provide a novel step toward analyzing aberrant (nonlinear) dynamics for clinical assessment or neuroscientific research.
In this paper, we propose a fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN). Given extracted speaker-discriminative embeddings (a.k.a. d-vectors) from input utterances, each individual speaker is modeled by a parameter-sharing RNN, while the RNN states for different speakers interleave in the time domain. This RNN is naturally integrated with a distance-dependent Chinese restaurant process (ddCRP) to accommodate an unknown number of speakers. Our system is fully supervised and is able to learn from examples where time-stamped speaker labels are annotated. We achieved a 7.6% diarization error rate on NIST SRE 2000 CALLHOME, which is better than the state-of-the-art method using spectral clustering. Moreover, our method decodes in an online fashion while most state-of-the-art systems rely on offline clustering.
It is a significant problem to predict the 2D LiDAR map at next moment for robotics navigation and path-planning. To tackle this problem, we resort to the motion flow between adjacent maps, as motion flow is a powerful tool to process and analyze the dynamic data, which is named optical flow in video processing. However, unlike video, which contains abundant visual features in each frame, a 2D LiDAR map lacks distinctive local features. To alleviate this challenge, we propose to estimate the motion flow based on deep neural networks inspired by its powerful representation learning ability in estimating the optical flow of the video. To this end, we design a recurrent neural network based on gated recurrent unit, which is named LiDAR-FlowNet. As a recurrent neural network can encode the temporal dynamic information, our LiDAR-FlowNet can estimate motion flow between the current map and the unknown next map only from the current frame and previous frames. A self-supervised strategy is further designed to train the LiDAR-FlowNet model effectively, while no training data need to be manually annotated. With the estimated motion flow, it is straightforward to predict the 2D LiDAR map at the next moment. Experimental results verify the effectiveness of our LiDAR-FlowNet as well as the proposed training strategy. The results of the predicted LiDAR map also show the advantages of our motion flow based method.