A dominant paradigm for learning-based approaches in computer vision is training generic models, such as ResNet for image recognition, or I3D for video understanding, on large datasets and allowing them to discover the optimal representation for the problem at hand. While this is an obviously attractive approach, it is not applicable in all scenarios. We claim that action detection is one such challenging problem - the models that need to be trained are large, and labeled data is expensive to obtain. To address this limitation, we propose to incorporate domain knowledge into the structure of the model, simplifying optimization. In particular, we augment a standard I3D network with a tracking module to aggregate long term motion patterns, and use a graph convolutional network to reason about interactions between actors and objects. Evaluated on the challenging AVA dataset, the proposed approach improves over the I3D baseline by 5.5% mAP and over the state-of-the-art by 4.8% mAP.
With the tremendous growth in the number of scientific papers being published, searching for references while writing a scientific paper is a time-consuming process. A technique that could add a reference citation at the appropriate place in a sentence will be beneficial. In this perspective, context-aware citation recommendation has been researched upon for around two decades. Many researchers have utilized the text data called the context sentence, which surrounds the citation tag, and the metadata of the target paper to find the appropriate cited research. However, the lack of well-organized benchmarking datasets and no model that can attain high performance has made the research difficult. In this paper, we propose a deep learning based model and well-organized dataset for context-aware paper citation recommendation. Our model comprises a document encoder and a context encoder, which uses Graph Convolutional Networks (GCN) layer and Bidirectional Encoder Representations from Transformers (BERT), which is a pre-trained model of textual data. By modifying the related PeerRead dataset, we propose a new dataset called FullTextPeerRead containing context sentences to cited references and paper metadata. To the best of our knowledge, This dataset is the first well-organized dataset for context-aware paper recommendation. The results indicate that the proposed model with the proposed datasets can attain state-of-the-art performance and achieve a more than 28% improvement in mean average precision (MAP) and [email protected]
Over the past 15 years, the volume, richness and quality of data collected from the combined social networking platforms has increased beyond all expectation, providing researchers from a variety of disciplines to use it in their research. Perhaps more impactfully, it has provided the foundation for a range of new products and services, transforming industries such as advertising and marketing, as well as bringing the challenges of sharing personal data into the public consciousness. But how to make sense of the ever-increasing volume of big social data so that we can better understand and improve the user experience in increasingly complex, data-driven digital systems. This link with usability and the user experience of data-driven system bridges into the wider field of HCI, attracting interdisciplinary researchers as we see the demand for consumer technologies, software and systems, as well as the integration of social networks into our everyday lives. The fact that the data largely posted on social networks tends to be textual, provides a further link to linguistics, psychology and psycholinguistics to better understand the relationship between human behaviours offline and online. In this thesis, we present a novel conceptual framework based on a complex digital system using collected longitudinal datasets to predict system status based on the personality traits and emotions extracted from text posted by users. The system framework was built using a dataset collected from an online scholarship system in which 2000 students had their digital behaviour and social network behaviour collected for this study. We contextualise this research project with a wider review and critical analysis of the current psycholinguistics, artificial intelligence and human-computer interaction literature, which reveals a gap of mapping and understanding digital profiling against system status.
Scientific documents rely on both mathematics and text to communicate ideas. Inspired by the topical correspondence between mathematical equations and word contexts observed in scientific texts, we propose a novel topic model that jointly generates mathematical equations and their surrounding text (TopicEq). Using an extension of the correlated topic model, the context is generated from a mixture of latent topics, and the equation is generated by an RNN that depends on the latent topic activations. To experiment with this model, we create a corpus of 400K equation-context pairs extracted from a range of scientific articles from arXiv, and fit the model using a variational autoencoder approach. Experimental results show that this joint model significantly outperforms existing topic models and equation models for scientific texts. Moreover, we qualitatively show that the model effectively captures the relationship between topics and mathematics, enabling novel applications such as topic-aware equation generation, equation topic inference, and topic-aware alignment of mathematical symbols and words.
This paper presents a robust and comprehensive graph-based rank aggregation approach, used to combine results of isolated ranker models in retrieval tasks. The method follows an unsupervised scheme, which is independent of how the isolated ranks are formulated. Our approach is able to combine arbitrary models, defined in terms of different ranking criteria, such as those based on textual, image or hybrid content representations. We reformulate the ad-hoc retrieval problem as a document retrieval of their fusion graph, which we propose as a new unified representation model capable of merging multiple ranks and expressing inter-relationships of retrieval results automatically. By doing so, we claim that the retrieval system can benefit from learning the manifold structure of datasets, thus leading to more effective results. Another contribution is that our graph-based aggregation formulation, unlike existing approaches, allows for encapsulating contextual information encoded from multiple ranks, which can be directly used for ranking, without further computations and processing steps over the graphs. Based on the graphs, a novel similarity retrieval score is formulated using an efficient computation of minimum common subgraphs. Finally, another benefit over existing approaches is the absence of hyperparameters. A comprehensive experimental evaluation was conducted considering diverse well-known public datasets, composed of textual, image, and multimodal documents. Performed experiments demonstrate that our method reaches top performance, yielding better effectiveness scores than state-of-the-art baseline methods and promoting large gains over the rankers being fused, thus showing the successful capability of the proposal in representing queries based on a unified graph-based model of rank fusions.
Electrocardiographic signal is a subject to multiple noises, caused by various factors. It is therefore a standard practice to denoise such signal before further analysis. With advances of new branch of machine learning, called deep learning, new methods are available that promises state-of-the-art performance for this task. We present a novel approach to denoise electrocardiographic signals with deep recurrent denoising neural networks. We utilize a transfer learning technique by pretraining the network using synthetic data, generated by a dynamic ECG model, and fine-tuning it with a real data. We also investigate the impact of the synthetic training data on the network performance on real signals. The proposed method was tested on a real dataset with varying amount of noise. The results indicate that four-layer deep recurrent neural network can outperform reference methods for heavily noised signal. Moreover, networks pretrained with synthetic data seem to have better results than network trained with real data only. We show that it is possible to create state-of-the art denoising neural network that, pretrained on artificial data, can perform exceptionally well on real ECG signals after proper fine-tuning.
In this work, we study value function approximation in reinforcement learning (RL) problems with high dimensional state or action spaces via a generalized version of representation policy iteration (RPI). We consider the limitations of proto-value functions (PVFs) at accurately approximating the value function in low dimensions and we highlight the importance of features learning for an improved low-dimensional value function approximation. Then, we adopt different representation learning algorithm on graphs to learn the basis functions that best represent the value function. We empirically show that node2vec, an algorithm for scalable feature learning in networks, and the Variational Graph Auto-Encoder constantly outperform the commonly used smooth proto-value functions in low-dimensional feature space.
The Computing Community Consortium (CCC), along with the White House Office of Science and Technology Policy (OSTP), and the Association for the Advancement of Artificial Intelligence (AAAI), co-sponsored a public workshop on Artificial Intelligence for Social Good on June 7th, 2016 in Washington, DC. This was one of five workshops that OSTP co-sponsored and held around the country to spur public dialogue on artificial intelligence, machine learning, and to identify challenges and opportunities related to AI. In the AI for Social Good workshop, the successful deployments and the potential use of AI in various topics that are essential for social good were discussed, including but not limited to urban computing, health, environmental sustainability, and public welfare. This report highlights each of these as well as a number of crosscutting issues.
In most agent-based simulators, pedestrians navigate from origins to destinations. Consequently, destinations are essential input parameters to the simulation. While many other relevant parameters as positions, speeds and densities can be obtained from sensors, like cameras, destinations cannot be observed directly. Our research question is: Can we obtain this information from video data using machine learning methods? We use density heatmaps, which indicate the pedestrian density within a given camera cutout, as input to predict the destination distributions. For our proof of concept, we train a Random Forest predictor on an exemplary data set generated with the Vadere microscopic simulator. The scenario is a crossroad where pedestrians can head left, straight or right. In addition, we gain first insights on suitable placement of the camera. The results motivate an in-depth analysis of the methodology.