Intelligent Transportation Systems (ITS) have become an important pillar in modern "smart city" framework which demands intelligent involvement of machines. Traffic load recognition can be categorized as an important and challenging issue for such systems. Recently, Convolutional Neural Network (CNN) models have drawn considerable amount of interest in many areas such as weather classification, human rights violation detection through images, due to its accurate prediction capabilities. This work tackles real-life traffic load recognition problem on System-On-a-Programmable-Chip (SOPC) platform and coin it as MAT-CNN- SOPC, which uses an intelligent re-training mechanism of the CNN with known environments. The proposed methodology is capable of enhancing the efficacy of the approach by 2.44x in comparison to the state-of-art and proven through experimental analysis. We have also introduced a mathematical equation, which is capable of quantifying the suitability of using different CNN models over the other for a particular application based implementation.
We study the impact of neural networks in text classification. Our focus is on training deep neural networks with proper weight initialization and greedy layer-wise pretraining. Results are compared with 1-layer neural networks and Support Vector Machines. We work with a dataset of labeled messages from the Twitter microblogging service and aim to predict weather conditions. A feature extraction procedure specific for the task is proposed, which applies dimensionality reduction using Latent Semantic Analysis. Our results show that neural networks outperform Support Vector Machines with Gaussian kernels, noticing performance gains from introducing additional hidden layers with nonlinearities. The impact of using Nesterov's Accelerated Gradient in backpropagation is also studied. We conclude that deep neural networks are a reasonable approach for text classification and propose further ideas to improve performance.
The recent advances of deep learning in both computer vision (CV) and natural language processing (NLP) provide us a new way of understanding semantics, by which we can deal with more challenging tasks such as automatic description generation from natural images. In this challenge, the encoder-decoder framework has achieved promising performance when a convolutional neural network (CNN) is used as image encoder and a recurrent neural network (RNN) as decoder. In this paper, we introduce a sequential guiding network that guides the decoder during word generation. The new model is an extension of the encoder-decoder framework with attention that has an additional guiding long short-term memory (LSTM) and can be trained in an end-to-end manner by using image/descriptions pairs. We validate our approach by conducting extensive experiments on a benchmark dataset, i.e., MS COCO Captions. The proposed model achieves significant improvement comparing to the other state-of-the-art deep learning models.
In this paper, we propose Generative Adversarial Network (GAN) architectures that use Capsule Networks for image-synthesis. Based on the principal of positional-equivariance of features, Capsule Network's ability to encode spatial relationships between the features of the image helps it become a more powerful critic in comparison to Convolutional Neural Networks (CNNs) used in current architectures for image synthesis. Our proposed GAN architectures learn the data manifold much faster and therefore, synthesize visually accurate images in significantly lesser number of training samples and training epochs in comparison to GANs and its variants that use CNNs. Apart from analyzing the quantitative results corresponding the images generated by different architectures, we also explore the reasons for the lower coverage and diversity explored by the GAN architectures that use CNN critics.
The ever-growing interest witnessed in the acquisition and development of unmanned aerial vehicles (UAVs), commonly known as drones in the past few years, has brought generation of a very promising and effective technology. Because of their characteristic of small size and fast deployment, UAVs have shown their effectiveness in collecting data over unreachable areas and restricted coverage zones. Moreover, their flexible-defined capacity enables them to collect information with a very high level of detail, leading to high resolution images. UAVs mainly served in military scenario. However, in the last decade, they have being broadly adopted in civilian applications as well. The task of aerial surveillance and situation awareness is usually completed by integrating intelligence, surveillance, observation, and navigation systems, all interacting in the same operational framework. To build this capability, UAV's are well suited tools that can be equipped with a wide variety of sensors, such as cameras or radars. Deep learning has been widely recognized as a prominent approach in different computer vision applications. Specifically, one-stage object detector and two-stage object detector are regarded as the most important two groups of Convolutional Neural Network based object detection methods. One-stage object detector could usually outperform two-stage object detector in speed; however, it normally trails in detection accuracy, compared with two-stage object detectors. In this study, focal loss based RetinaNet, which works as one-stage object detector, is utilized to be able to well match the speed of regular one-stage detectors and also defeat two-stage detectors in accuracy, for UAV based object detection. State-of-the-art performance result has been showed on the UAV captured image dataset-Stanford Drone Dataset (SDD).
We propose a Bayesian convolutional neural network built upon Bayes by Backprop and elaborate how this known method can serve as the fundamental construct of our novel, reliable variational inference method for convolutional neural networks. First, we show how Bayes by Backprop can be applied to convolutional layers where weights in filters have probability distributions instead of point-estimates; and second, how our proposed framework leads with various network architectures to performances comparable to convolutional neural networks with point-estimates weights. In the past, Bayes by Backprop has been successfully utilised in feedforward and recurrent neural networks, but not in convolutional ones. This work symbolises the extension of the group of Bayesian neural networks which encompasses all three aforementioned types of network architectures now.
The task of Question Answering has gained prominence in the past few decades for testing the ability of machines to understand natural language. Large datasets for Machine Reading have led to the development of neural models that cater to deeper language understanding compared to information retrieval tasks. Different components in these neural architectures are intended to tackle different challenges. As a first step towards achieving generalization across multiple domains, we attempt to understand and compare the peculiarities of existing end-to-end neural models on the Stanford Question Answering Dataset (SQuAD) by performing quantitative as well as qualitative analysis of the results attained by each of them. We observed that prediction errors reflect certain model-specific biases, which we further discuss in this paper.
Image captioning is a challenging task that combines the field of computer vision and natural language processing. A variety of approaches have been proposed to achieve the goal of automatically describing an image, and recurrent neural network (RNN) or long-short term memory (LSTM) based models dominate this field. However, RNNs or LSTMs cannot be calculated in parallel and ignore the underlying hierarchical structure of a sentence. In this paper, we propose a framework that only employs convolutional neural networks (CNNs) to generate captions. Owing to parallel computing, our basic model is around 3 times faster than NIC (an LSTM-based model) during training time, while also providing better results. We conduct extensive experiments on MSCOCO and investigate the influence of the model width and depth. Compared with LSTM-based models that apply similar attention mechanisms, our proposed models achieves comparable scores of BLEU-1,2,3,4 and METEOR, and higher scores of CIDEr. We also test our model on the paragraph annotation dataset, and get higher CIDEr score compared with hierarchical LSTMs
Classifying large scale networks into several categories and distinguishing them according to their fine structures is of great importance with several applications in real life. However, most studies of complex networks focus on properties of a single network but seldom on classification, clustering, and comparison between different networks, in which the network is treated as a whole. Due to the non-Euclidean properties of the data, conventional methods can hardly be applied on networks directly. In this paper, we propose a novel framework of complex network classifier (CNC) by integrating network embedding and convolutional neural network to tackle the problem of network classification. By training the classifiers on synthetic complex network data and real international trade network data, we show CNC can not only classify networks in a high accuracy and robustness, it can also extract the features of the networks automatically.
In this paper, we propose a method of improving Convolutional Neural Networks (CNN) by determining the optimal alignment of weights and inputs using dynamic programming. Conventional CNNs convolve learnable shared weights, or filters, across the input data. The filters use a linear matching of weights to inputs using an inner product between the filter and a window of the input. However, it is possible that there exists a more optimal alignment of weights. Thus, we propose the use of Dynamic Time Warping (DTW) to dynamically align the weights to optimized input elements. This dynamic alignment is useful for time series recognition due to the complexities of temporal relations and temporal distortions. We demonstrate the effectiveness of the proposed architecture on the Unipen online handwritten digit and character datasets, the UCI Spoken Arabic Digit dataset, and the UCI Activities of Daily Life dataset.
Automatically creating the description of an image using any natural languages sentence like English is a very challenging task. It requires expertise of both image processing as well as natural language processing. This paper discuss about different available models for image captioning task. We have also discussed about how the advancement in the task of object recognition and machine translation has greatly improved the performance of image captioning model in recent years. In addition to that we have discussed how this model can be implemented. In the end, we have also evaluated the performance of model using standard evaluation matrices.