Magic is the art of producing in the spectator an illusion of impossibility. Although the scientific study of magic is in its infancy, the advent of recent tracking algorithms based on deep learning allow now to quantify the skills of the magician in naturalistic conditions at unprecedented resolution and robustness. In this study, we deconstructed stage magic into purely motor maneuvers and trained an artificial neural network (DeepLabCut) to follow coins as a professional magician made them appear and disappear in a series of tricks. Rather than using AI as a mere tracking tool, we conceived it as an "artificial spectator". When the coins were not visible, the algorithm was trained to infer their location as a human spectator would (i.e. in the left fist). This created situations where the human was fooled while AI (as seen by a human) was not, and vice versa. Magic from the perspective of the machine reveals our own cognitive biases.
Can an arbitrarily intelligent reinforcement learning agent be kept under control by a human user? Or do agents with sufficient intelligence inevitably find ways to shortcut their reward signal? This question impacts how far reinforcement learning can be scaled, and whether alternative paradigms must be developed in order to build safe artificial general intelligence. In this paper, we use an intuitive yet precise graphical model called causal influence diagrams to formalize reward tampering problems. We also describe a number of modifications to the reinforcement learning objective that prevent incentives for reward tampering. We verify the solutions using recently developed graphical criteria for inferring agent incentives from causal influence diagrams. Along the way, we also compare corrigibility and self-preservation properties of the various solutions, and discuss how they can be combined into a single agent without reward tampering incentives.
Process graph extraction (PGE) is a recently emerged interdiscipline between natural language processing and business process management, which aims to extract process graphs expressed in texts. Previous process extractors heavily depend on manual features and ignore the potential relations between clues of different text granularities. In this paper, we formalize the PGE task into the multi-granularity text classification problem, and propose a hierarchical model to effectively model and extract multi-granularity information without manually defined procedural knowledge. Under this framework, we accordingly propose the coarse-to-fine learning mechanism, training multi-granularity tasks in coarse-to-fine order to share the high-level knowledge for the low-level tasks. To evaluate our approach, we construct two finer-grained datasets from two sentence-level corpora and conduct extensive experiments from different dimensions. The experimental results demonstrate that our approach outperforms the state-of-the-art methods with statistical significance, and the ablation studies demonstrate its effectiveness.
Purpose: Manual feedback from senior surgeons observing less experienced trainees is a laborious task that is very expensive, time-consuming and prone to subjectivity. With the number of surgical procedures increasing annually, there is an unprecedented need to provide an accurate, objective and automatic evaluation of trainees' surgical skills in order to improve surgical practice. Methods: In this paper, we designed a convolutional neural network (CNN) to classify surgical skills by extracting latent patterns in the trainees' motions performed during robotic surgery. The method is validated on the JIGSAWS dataset for two surgical skills evaluation tasks: classification and regression. Results: Our results show that deep neural networks constitute robust machine learning models that are able to reach new competitive state-of-the-art performance on the JIGSAWS dataset. While we leveraged from CNNs' efficiency, we were able to minimize its black-box effect using the class activation map technique. Conclusions: This characteristic allowed our method to automatically pinpoint which parts of the surgery influenced the skill evaluation the most, thus allowing us to explain a surgical skill classification and provide surgeons with a novel personalized feedback technique. We believe this type of interpretable machine learning model could integrate within "Operation Room 2.0" and support novice surgeons in improving their skills to eventually become experts.
Knowledge graphs have attracted lots of attention in academic and industrial environments. Despite their usefulness, popular knowledge graphs suffer from incompleteness of information, especially in their type assertions. This has encouraged research in the automatic discovery of entity types. In this context, multiple works were developed to utilize logical inference on ontologies and statistical machine learning methods to learn type assertion in knowledge graphs. However, these approaches suffer from limited performance on noisy data, limited scalability and the dependence on labeled training samples. In this work, we propose a new unsupervised approach that learns to categorize entities into a hierarchy of named groups. We show that our approach is able to effectively learn entity groups using a scalable procedure in noisy and sparse datasets. We experiment our approach on a set of popular knowledge graph benchmarking datasets, and we publish a collection of the outcome group hierarchies.
Facial pose estimation has gained a lot of attentions in many practical applications, such as human-robot interaction, gaze estimation and driver monitoring. Meanwhile, end-to-end deep learning-based facial pose estimation is becoming more and more popular. However, facial pose estimation suffers from a key challenge: the lack of sufficient training data for many poses, especially for large poses. Inspired by the observation that the faces under close poses look similar, we reformulate the facial pose estimation as a label distribution learning problem, considering each face image as an example associated with a Gaussian label distribution rather than a single label, and construct a convolutional neural network which is trained with a multi-loss function on AFLW dataset and 300WLP dataset to predict the facial poses directly from color image. Extensive experiments are conducted on several popular benchmarks, including AFLW2000, BIWI, AFLW and AFW, where our approach shows a significant advantage over other state-of-the-art methods.
Knowledge graph embedding models have gained significant attention in AI research. Recent works have shown that the inclusion of background knowledge, such as logical rules, can improve the performance of embeddings in downstream machine learning tasks. However, so far, most existing models do not allow the inclusion of rules. We address the challenge of including rules and present a new neural based embedding model (LogicENN). We prove that LogicENN can learn every ground truth of encoded rules in a knowledge graph. To the best of our knowledge, this has not been proved so far for the neural based family of embedding models. Moreover, we derive formulae for the inclusion of various rules, including (anti-)symmetric, inverse, irreflexive and transitive, implication, composition, equivalence and negation. Our formulation allows to avoid grounding for implication and equivalence relations. Our experiments show that LogicENN outperforms the state-of-the-art models in link prediction.
Deep neural networks (DNNs) have achieved significant success in a variety of real world applications, i.e., image classification. However, tons of parameters in the networks restrict the efficiency of neural networks due to the large model size and the intensive computation. To address this issue, various approximation techniques have been investigated, which seek for a light weighted network with little performance degradation in exchange of smaller model size or faster inference. Both low-rankness and sparsity are appealing properties for the network approximation. In this paper we propose a unified framework to compress the convolutional neural networks (CNNs) by combining these two properties, while taking the nonlinear activation into consideration. Each layer in the network is approximated by the sum of a structured sparse component and a low-rank component, which is formulated as an optimization problem. Then, an extended version of alternating direction method of multipliers (ADMM) with guaranteed convergence is presented to solve the relaxed optimization problem. Experiments are carried out on VGG-16, AlexNet and GoogLeNet with large image classification datasets. The results outperform previous work in terms of accuracy degradation, compression rate and speedup ratio. The proposed method is able to remarkably compress the model (with up to 4.9x reduction of parameters) at a cost of little loss or without loss on accuracy.
Graph Neural Networks (GNNs) have been popularly used for analyzing non-Euclidean data such as social network data and biological data. Despite their success, the design of graph neural networks requires a lot of manual work and domain knowledge. In this paper, we propose a Graph Neural Architecture Search method (GraphNAS for short) that enables automatic search of the best graph neural architecture based on reinforcement learning. Specifically, GraphNAS first uses a recurrent network to generate variable-length strings that describe the architectures of graph neural networks, and then trains the recurrent network with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation data set. Extensive experimental results on node classification tasks in both transductive and inductive learning settings demonstrate that GraphNAS can achieve consistently better performance on the Cora, Citeseer, Pubmed citation network, and protein-protein interaction network. On node classification tasks, GraphNAS can design a novel network architecture that rivals the best human-invented architecture in terms of test set accuracy.
A massive number of well-trained deep networks have been released by developers online. These networks may focus on different tasks and in many cases are optimized for different datasets. In this paper, we study how to exploit such heterogeneous pre-trained networks, known as teachers, so as to train a customized student network that tackles a set of selective tasks defined by the user. We assume no human annotations are available, and each teacher may be either single- or multi-task. To this end, we introduce a dual-step strategy that first extracts the task-specific knowledge from the heterogeneous teachers sharing the same sub-task, and then amalgamates the extracted knowledge to build the student network. To facilitate the training, we employ a selective learning scheme where, for each unlabelled sample, the student learns adaptively from only the teacher with the least prediction ambiguity. We evaluate the proposed approach on several datasets and experimental results demonstrate that the student, learned by such adaptive knowledge amalgamation, achieves performances even better than those of the teachers.
Artificial Intelligence (AI) is a cognitive science to enables human to explore many intelligent ways to model our sensing and reasoning processes. Industrial AI is a systematic discipline to enable engineers to systematically develop and deploy AI algorithms with repeating and consistent successes. In this paper, the key enablers for this transformative technology along with their significant advantages are discussed. In addition, this research explains Lighthouse Factories as an emerging status applying to the top manufacturers that have implemented Industrial AI in their manufacturing ecosystem and gained significant financial benefits. It is believed that this research will work as a guideline and roadmap for researchers and industries towards the real-world implementation of Industrial AI.
Word embeddings are a popular approach to unsupervised learning of word relationships that are widely used in natural language processing. In this article, we present a new set of embeddings for medical concepts learned using an extremely large collection of multimodal medical data. Leaning on recent theoretical insights, we demonstrate how an insurance claims database of 60 million members, a collection of 20 million clinical notes, and 1.7 million full text biomedical journal articles can be combined to embed concepts into a common space, resulting in the largest ever set of embeddings for 108,477 medical concepts. To evaluate our approach, we present a new benchmark methodology based on statistical power specifically designed to test embeddings of medical concepts. Our approach, called cui2vec, attains state-of-the-art performance relative to previous methods in most instances. Finally, we provide a downloadable set of pre-trained embeddings for other researchers to use, as well as an online tool for interactive exploration of the cui2vec embeddings
Recently, there have been several high-profile achievements of agents learning to play games against humans and beat them. In this paper, we study the problem of training intelligent agents in service of game development. Unlike the agents built to "beat the game", our agents aim to produce human-like behavior to help with game evaluation and balancing. We discuss two fundamental metrics based on which we measure the human-likeness of agents, namely skill and style, which are multi-faceted concepts with practical implications outlined in this paper. We discuss how this framework applies to multiple games under development at Electronic Arts, followed by some of the lessons learned.
We classify .NET files as either benign or malicious by examining directed graphs derived from the set of functions comprising the given file. Each graph is viewed probabilistically as a Markov chain where each node represents a code block of the corresponding function, and by computing the PageRank vector (Perron vector with transport), a probability measure can be defined over the nodes of the given graph. Each graph is vectorized by computing Lebesgue antiderivatives of hand-engineered functions defined on the vertex set of the given graph against the PageRank measure. Files are subsequently vectorized by aggregating the set of vectors corresponding to the set of graphs resulting from decompiling the given file. The result is a fast, intuitive, and easy-to-compute glass-box vectorization scheme, which can be leveraged for training a standalone classifier or to augment an existing feature space. We refer to this vectorization technique as PageRank Measure Integration Vectorization (PMIV). We demonstrate the efficacy of PMIV by training a vanilla random forest on 2.5 million samples of decompiled .NET, evenly split between benign and malicious, from our in-house corpus and compare this model to a baseline model which leverages a text-only feature space. The median time needed for decompilation and scoring was 24ms.
Successful robot-assisted feeding requires bite acquisition of a wide variety of food items. Different food items may require different manipulation actions for successful bite acquisition. Therefore, a key challenge is to handle previously-unseen food items with very different action distributions. By leveraging contexts from previous bite acquisition attempts, a robot should be able to learn online how to acquire those previously-unseen food items. In this ongoing work, we construct a contextual bandit framework for this problem setting. We then propose variants of the $\epsilon$-greedy and LinUCB contextual bandit algorithms to minimize cumulative regret within that setting. In future, we expect empirical estimates of cumulative regret for each algorithm on robot bite acquisition trials as well as updated theoretical regret bounds that leverage the more structured context of this problem setting.