We explore the possibility of using a single monocular camera to forecast the time to collision between a suitcase-shaped robot being pushed by its user and other nearby pedestrians. We develop a purely image-based deep learning approach that directly estimates the time to collision without the need of relying on explicit geometric depth estimates or velocity information to predict future collisions. While previous work has focused on detecting immediate collision in the context of navigating Unmanned Aerial Vehicles, the detection was limited to a binary variable (i.e., collision or no collision). We propose a more fine-grained approach to collision forecasting by predicting the exact time to collision in terms of milliseconds, which is more helpful for collision avoidance in the context of dynamic path planning. To evaluate our method, we have collected a novel large-scale dataset of over 13,000 indoor video segments each showing a trajectory of at least one person ending in a close proximity (a near collision) with the camera mounted on a mobile suitcase-shaped platform. Using this dataset, we do extensive experimentation on different temporal windows as input using an exhaustive list of state-of-the-art convolutional neural networks (CNNs). Our results show that our proposed multi-stream CNN is the best model for predicting time to near-collision. The average prediction error of our time to near collision is 0.75 seconds across our test environments.
One of the largest obstacles facing scanning probe microscopy is the constant need to correct flaws in the scanning probe in situ. This is currently a manual, time-consuming process that would benefit greatly from automation. Here we introduce a convolutional neural network protocol that enables automated recognition of a variety of desirable and undesirable scanning probe tip states on both metal and non-metal surfaces. By combining the best performing models into majority voting ensembles, we find that the desirable states of H:Si(100) can be distinguished with a mean precision of 0.89 and an average receiver-operator-characteristic curve area of 0.95. More generally, high and low-quality tips can be distinguished with a mean precision of 0.96 and near perfect area-under-curve of 0.98. With trivial modifications, we also successfully automatically identify undesirable, non-surface-specific states on surfaces of Au(111) and Cu(111). In these cases we find mean precisions of 0.95 and 0.75 and area-under-curves of 0.98 and 0.94, respectively.
Neural network-based methods have recently demonstrated state-of-the-art results on image synthesis and super-resolution tasks, in particular by using variants of generative adversarial networks (GANs) with supervised feature losses. Nevertheless, previous feature loss formulations rely on the availability of large auxiliary classifier networks, and labeled datasets that enable such classifiers to be trained. Furthermore, there has been comparatively little work to explore the applicability of GAN-based methods to domains other than images and video. In this work we explore a GAN-based method for audio processing, and develop a convolutional neural network architecture to perform audio super-resolution. In addition to several new architectural building blocks for audio processing, a key component of our approach is the use of an autoencoder-based loss that enables training in the GAN framework, with feature losses derived from unlabeled data. We explore the impact of our architectural choices, and demonstrate significant improvements over previous works in terms of both objective and perceptual quality.
This paper proposes an efficient unsupervised method for detecting relevant changes between two temporally different images of the same scene. A convolutional neural network (CNN) for semantic segmentation is implemented to extract compressed image features, as well as to classify the detected changes into the correct semantic classes. A difference image is created using the feature map information generated by the CNN, without explicitly training on target difference images. Thus, the proposed change detection method is unsupervised, and can be performed using any CNN model pre-trained for semantic segmentation.
We exploit altered patterns in brain functional connectivity as features for automatic discriminative analysis of neuropsychiatric patients. Deep learning methods have been introduced to functional network classification only very recently for fMRI, and the proposed architectures essentially focused on a single type of connectivity measure. We propose a deep convolutional neural network (CNN) framework for classification of electroencephalogram (EEG)-derived brain connectome in schizophrenia (SZ). To capture complementary aspects of disrupted connectivity in SZ, we explore combination of various connectivity features consisting of time and frequency-domain metrics of effective connectivity based on vector autoregressive model and partial directed coherence, and complex network measures of network topology. We design a novel multi-domain connectome CNN (MDC-CNN) based on a parallel ensemble of 1D and 2D CNNs to integrate the features from various domains and dimensions using different fusion strategies. Hierarchical latent representations learned by the multiple convolutional layers from EEG connectivity reveal apparent group differences between SZ and healthy controls (HC). Results on a large resting-state EEG dataset show that the proposed CNNs significantly outperform traditional support vector machine classifiers. The MDC-CNN with combined connectivity features further improves performance over single-domain CNNs using individual features, achieving remarkable accuracy of $93.06\%$ with a decision-level fusion. The proposed MDC-CNN by integrating information from diverse brain connectivity descriptors is able to accurately discriminate SZ from HC. The new framework is potentially useful for developing diagnostic tools for SZ and other disorders.