Deep Learning in Computer Vision: Methods, Interpretation, Causation, and Fairness Deep learning models have succeeded at a variety of human intelligence tasks and are already being used at commercial scale. These models largely rely on standard gradient descent optimization of function parameterized by , which maps an input to an output . The optimization procedure minimizes the loss (difference) between the model output and actual output . As an example, in the cancer detection setting, is an MRI image, and is the presence or absence of cancer. Three key ingredients hint at the reason behind deep learning’s power: (1) deep architectures that are adept at breaking down complex functions into a composition of simpler abstract parts; (2) standard gradient descent methods that can attain local minima on a nonconvex Loss function that are close enough to the global minima; and (3) learning algorithms that can be executed on parallel computing hardware (e.g., graphics processing units), thus making the optimization viable over hundreds of millions of observations . Computer vision tasks, where the input is a high-dimensional image or video, are particularly suited to deep learning application. Recent advances in deep architectures (i.e., inception modules, attention networks, adversarial networks and DeepRL) have opened up completely new applications that were previously unexplored. However, the breakneck progress to replace human tasks with deep learning comes with caveats. These deep models tend to evade interpretation, lack causal relationships between input and output , and may inadvertently mimic not just human actions but also human biases and stereotypes. In this tutorial, we provide an intuitive explanation of deep learning methods in computer vision as well as limitations in practice.

成为VIP会员查看完整内容
0
31

相关内容

卡耐基梅隆大学(Carnegie Mellon University)坐落在宾夕法尼亚州的匹兹堡,是一所享誉世界的私立顶级研究型大学,学校面积不大,学科门类不多,但在其所设立的几乎所有专业都居于世界领先水平。卡内基梅隆大学享誉全国的认知心理学、管理和公共关系学、写作和修辞学、应用历史学、哲学和生物科学专业。它的计算机、机器人科学、理学、美术及工业管理都是举世公认的一流专业。

机器学习可解释性,Interpretability and Explainability in Machine Learning

  • Overview As machine learning models are increasingly being employed to aid decision makers in high-stakes settings such as healthcare and criminal justice, it is important to ensure that the decision makers (end users) correctly understand and consequently trust the functionality of these models. This graduate level course aims to familiarize students with the recent advances in the emerging field of interpretable and explainable ML. In this course, we will review seminal position papers of the field, understand the notion of model interpretability and explainability, discuss in detail different classes of interpretable models (e.g., prototype based approaches, sparse linear models, rule based techniques, generalized additive models), post-hoc explanations (black-box explanations including counterfactual explanations and saliency maps), and explore the connections between interpretability and causality, debugging, and fairness. The course will also emphasize on various applications which can immensely benefit from model interpretability including criminal justice and healthcare.
成为VIP会员查看完整内容
0
37

In recent years, the biggest advances in major Computer Vision tasks, such as object recognition, handwritten-digit identification, facial recognition, and many others., have all come through the use of Convolutional Neural Networks (CNNs). Similarly, in the domain of Natural Language Processing, Recurrent Neural Networks (RNNs), and Long Short Term Memory networks (LSTMs) in particular, have been crucial to some of the biggest breakthroughs in performance for tasks such as machine translation, part-of-speech tagging, sentiment analysis, and many others. These individual advances have greatly benefited tasks even at the intersection of NLP and Computer Vision, and inspired by this success, we studied some existing neural image captioning models that have proven to work well. In this work, we study some existing captioning models that provide near state-of-the-art performances, and try to enhance one such model. We also present a simple image captioning model that makes use of a CNN, an LSTM, and the beam search1 algorithm, and study its performance based on various qualitative and quantitative metrics.

0
4
下载
预览

During the last decade, Convolutional Neural Networks (CNNs) have become the de facto standard for various Computer Vision and Machine Learning operations. CNNs are feed-forward Artificial Neural Networks (ANNs) with alternating convolutional and subsampling layers. Deep 2D CNNs with many hidden layers and millions of parameters have the ability to learn complex objects and patterns providing that they can be trained on a massive size visual database with ground-truth labels. With a proper training, this unique ability makes them the primary tool for various engineering applications for 2D signals such as images and video frames. Yet, this may not be a viable option in numerous applications over 1D signals especially when the training data is scarce or application-specific. To address this issue, 1D CNNs have recently been proposed and immediately achieved the state-of-the-art performance levels in several applications such as personalized biomedical data classification and early diagnosis, structural health monitoring, anomaly detection and identification in power electronics and motor-fault detection. Another major advantage is that a real-time and low-cost hardware implementation is feasible due to the simple and compact configuration of 1D CNNs that perform only 1D convolutions (scalar multiplications and additions). This paper presents a comprehensive review of the general architecture and principals of 1D CNNs along with their major engineering applications, especially focused on the recent progress in this field. Their state-of-the-art performance is highlighted concluding with their unique properties. The benchmark datasets and the principal 1D CNN software used in those applications are also publically shared in a dedicated website.

0
4
下载
预览

In structure learning, the output is generally a structure that is used as supervision information to achieve good performance. Considering the interpretation of deep learning models has raised extended attention these years, it will be beneficial if we can learn an interpretable structure from deep learning models. In this paper, we focus on Recurrent Neural Networks (RNNs) whose inner mechanism is still not clearly understood. We find that Finite State Automaton (FSA) that processes sequential data has more interpretable inner mechanism and can be learned from RNNs as the interpretable structure. We propose two methods to learn FSA from RNN based on two different clustering methods. We first give the graphical illustration of FSA for human beings to follow, which shows the interpretability. From the FSA's point of view, we then analyze how the performance of RNNs are affected by the number of gates, as well as the semantic meaning behind the transition of numerical hidden states. Our results suggest that RNNs with simple gated structure such as Minimal Gated Unit (MGU) is more desirable and the transitions in FSA leading to specific classification result are associated with corresponding words which are understandable by human beings.

0
16
下载
预览

Interaction and collaboration between humans and intelligent machines has become increasingly important as machine learning methods move into real-world applications that involve end users. While much prior work lies at the intersection of natural language and vision, such as image captioning or image generation from text descriptions, less focus has been placed on the use of language to guide or improve the performance of a learned visual processing algorithm. In this paper, we explore methods to flexibly guide a trained convolutional neural network through user input to improve its performance during inference. We do so by inserting a layer that acts as a spatio-semantic guide into the network. This guide is trained to modify the network's activations, either directly via an energy minimization scheme or indirectly through a recurrent model that translates human language queries to interaction weights. Learning the verbal interaction is fully automatic and does not require manual text annotations. We evaluate the method on two datasets, showing that guiding a pre-trained network can improve performance, and provide extensive insights into the interaction between the guide and the CNN.

0
4
下载
预览

In recent years, deep neural networks have yielded state-of-the-art performance on several tasks. Although some recent works have focused on combining deep learning with recommendation, we highlight three issues of existing works. First, most works perform deep content feature learning and resort to matrix factorization, which cannot effectively model the highly complex user-item interaction function. Second, due to the difficulty on training deep neural networks, existing models utilize a shallow architecture, and thus limit the expressive potential of deep learning. Third, neural network models are easy to overfit on the implicit setting, because negative interactions are not taken into account. To tackle these issues, we present a generic recommender framework called Neural Collaborative Autoencoder (NCAE) to perform collaborative filtering, which works well for both explicit feedback and implicit feedback. NCAE can effectively capture the relationship between interactions via a non-linear matrix factorization process. To optimize the deep architecture of NCAE, we develop a three-stage pre-training mechanism that combines supervised and unsupervised feature learning. Moreover, to prevent overfitting on the implicit setting, we propose an error reweighting module and a sparsity-aware data-augmentation strategy. Extensive experiments on three real-world datasets demonstrate that NCAE can significantly advance the state-of-the-art.

0
7
下载
预览

This paper presents a method of learning qualitatively interpretable models in object detection using popular two-stage region-based ConvNet detection systems (i.e., R-CNN). R-CNN consists of a region proposal network and a RoI (Region-of-Interest) prediction network.By interpretable models, we focus on weakly-supervised extractive rationale generation, that is learning to unfold latent discriminative part configurations of object instances automatically and simultaneously in detection without using any supervision for part configurations. We utilize a top-down hierarchical and compositional grammar model embedded in a directed acyclic AND-OR Graph (AOG) to explore and unfold the space of latent part configurations of RoIs. We propose an AOGParsing operator to substitute the RoIPooling operator widely used in R-CNN, so the proposed method is applicable to many state-of-the-art ConvNet based detection systems. The AOGParsing operator aims to harness both the explainable rigor of top-down hierarchical and compositional grammar models and the discriminative power of bottom-up deep neural networks through end-to-end training. In detection, a bounding box is interpreted by the best parse tree derived from the AOG on-the-fly, which is treated as the extractive rationale generated for interpreting detection. In learning, we propose a folding-unfolding method to train the AOG and ConvNet end-to-end. In experiments, we build on top of the R-FCN and test the proposed method on the PASCAL VOC 2007 and 2012 datasets with performance comparable to state-of-the-art methods.

0
3
下载
预览
小贴士
相关VIP内容
因果图,Causal Graphs,52页ppt
专知会员服务
106+阅读 · 2020年4月19日
专知会员服务
66+阅读 · 2020年2月1日
专知会员服务
96+阅读 · 2020年1月16日
强化学习最新教程,17页pdf
专知会员服务
43+阅读 · 2019年10月11日
【哈佛大学商学院课程Fall 2019】机器学习可解释性
专知会员服务
37+阅读 · 2019年10月9日
相关资讯
AI可解释性文献列表
专知
34+阅读 · 2019年10月7日
VALSE Webinar 特别专题之产学研共舞VALSE
VALSE
4+阅读 · 2019年9月19日
灾难性遗忘问题新视角:迁移-干扰平衡
CreateAMind
13+阅读 · 2019年7月6日
逆强化学习-学习人先验的动机
CreateAMind
4+阅读 · 2019年1月18日
【论文】深度学习的数学解释
机器学习研究会
8+阅读 · 2017年12月15日
计算机视觉近一年进展综述
机器学习研究会
6+阅读 · 2017年11月25日
【推荐】卷积神经网络类间不平衡问题系统研究
机器学习研究会
6+阅读 · 2017年10月18日
可解释的CNN
CreateAMind
11+阅读 · 2017年10月5日
【论文】图上的表示学习综述
机器学习研究会
6+阅读 · 2017年9月24日
最佳实践:深度学习用于自然语言处理(三)
待字闺中
3+阅读 · 2017年8月20日
相关论文
Neural Image Captioning
Elaina Tan,Lakshay Sharma
4+阅读 · 2019年7月2日
Bryan Wilder,Eric Ewing,Bistra Dilkina,Milind Tambe
4+阅读 · 2019年5月31日
1D Convolutional Neural Networks and Applications: A Survey
Serkan Kiranyaz,Onur Avci,Osama Abdeljaber,Turker Ince,Moncef Gabbouj,Daniel J. Inman
4+阅读 · 2019年5月9日
Borja Ibarz,Jan Leike,Tobias Pohlen,Geoffrey Irving,Shane Legg,Dario Amodei
4+阅读 · 2018年11月15日
Bo-Jian Hou,Zhi-Hua Zhou
16+阅读 · 2018年10月25日
Christian Rupprecht,Iro Laina,Nassir Navab,Gregory D. Hager,Federico Tombari
4+阅读 · 2018年3月30日
Qibing Li,Xiaolin Zheng,Xinyue Wu
7+阅读 · 2018年1月30日
Łukasz Kaiser,Samy Bengio
6+阅读 · 2018年1月29日
John Duchi,Hongseok Namkoong
5+阅读 · 2017年12月14日
Tianfu Wu,Xilai Li,Xi Song,Wei Sun,Liang Dong,Bo Li
3+阅读 · 2017年11月14日
Top