CVPR2019 | 15篇论文速递(涵盖目标检测、语义分割和姿态估计等方向)

2019 年 5 月 8 日 AI研习社
CVPR2019 | 15篇论文速递(涵盖目标检测、语义分割和姿态估计等方向)

【导读】CVPR 2019 接收论文列表已经出来了,但只是一些索引号,所以并没有完整的论文合集。CVer 最近也在整理收集,今天一文涵盖15篇 CVPR 2019 论文速递,内容涵盖目标检测、语义分割和姿态估计等方向。

姿态估计

[1] CVPR 2019 Pose estimation文章,目前SOTA,已经开源

论文题目:Deep High-Resolution Representation Learning for Human Pose Estimation

作者:Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang

论文链接:https://arxiv.org/abs/1902.09212 

代码链接:https://github.com/leoxiaobin/deep-high-resolution-net.pytorch

摘要:This is an official pytorch implementation of Deep High-Resolution Representation Learning for Human Pose Estimation. In this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process. We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. We empirically demonstrate the effectiveness of our network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset.

视频目标分割

[2] CVPR2019 VOS文章

论文题目:FEELVOS: Fast End-to-End Embedding Learning for Video Object Segmentation 

作者:Paul Voigtlaender, Yuning Chai, Florian Schroff, Hartwig Adam, Bastian Leibe, Liang-Chieh Chen

论文链接:https://arxiv.org/abs/1902.09513 

摘要:Many of the recent successful methods for video object segmentation (VOS) are overly complicated, heavily rely on fine-tuning on the first frame, and/or are slow, and are hence of limited practical use. In this work, we propose FEELVOS as a simple and fast method which does not rely on fine-tuning. In order to segment a video, for each frame FEELVOS uses a semantic pixel-wise embedding together with a global and a local matching mechanism to transfer information from the first frame and from the previous frame of the video to the current frame. In contrast to previous work, our embedding is only used as an internal guidance of a convolutional network. Our novel dynamic segmentation head allows us to train the network, including the embedding, end-to-end for the multiple object segmentation task with a cross entropy loss. We achieve a new state of the art in video object segmentation without fine-tuning on the DAVIS 2017 validation set with a J&F measure of 69.1%.


行为识别

[3] CVPR2019 Action Recognition文章

论文题目:An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition

作者:Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, Tieniu Tan

论文链接:https://arxiv.org/abs/1902.09130 

摘要:Skeleton-based action recognition is an important task that requires the adequate understanding of movement characteristics of a human action from the given skeleton sequence. Recent studies have shown that exploring spatial and temporal features of the skeleton sequence is vital for this task. Nevertheless, how to effectively extract discriminative spatial and temporal features is still a challenging problem. In this paper, we propose a novel Attention Enhanced Graph Convolutional LSTM Network (AGC-LSTM) for human action recognition from skeleton data. The proposed AGC-LSTM can not only capture discriminative features in spatial configuration and temporal dynamics but also explore the co-occurrence relationship between spatial and temporal domains. We also present a temporal hierarchical architecture to increases temporal receptive fields of the top AGC-LSTM layer, which boosts the ability to learn the high-level semantic representation and significantly reduces the computation cost. Furthermore, to select discriminative spatial information, the attention mechanism is employed to enhance information of key joints in each AGC-LSTM layer. Experimental results on two datasets are provided: NTU RGB+D dataset and Northwestern-UCLA dataset. The comparison results demonstrate the effectiveness of our approach and show that our approach outperforms the state-of-the-art methods on both datasets.


目标检测

[4] CVPR2019 检测新文

论文题目:Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression

作者:Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, Silvio Savarese

论文链接:https://arxiv.org/abs/1902.09630 

摘要:Intersection over Union (IoU) is the most popular evaluation metric used in the object detection benchmarks. However, there is a gap between optimizing the commonly used distance losses for regressing the parameters of a bounding box and maximizing this metric value. The optimal objective for a metric is the metric itself. In the case of axis-aligned 2D bounding boxes, it can be shown that IoU can be directly used as a regression loss. However, IoU has a plateau making it infeasible to optimize in the case of non-overlapping bounding boxes. In this paper, we address the weaknesses of IoU by introducing a generalized version as both a new loss and a new metric. By incorporating this generalized IoU (GIoU) as a loss into the state-of-the art object detection frameworks, we show a consistent improvement on their performance using both the standard, IoU based, and new, GIoU based, performance measures on popular object detection benchmarks such as PASCAL VOC and MS COCO.


图像分类

[5] CVPR2019 分类新文

论文题目:Learning a Deep ConvNet for Multi-label Classification with Partial Labels

作者:Thibaut Durand, Nazanin Mehrasa, Greg Mori

论文链接:https://arxiv.org/abs/1902.09720 

摘要:Deep ConvNets have shown great performance for single-label image classification (e.g. ImageNet), but it is necessary to move beyond the single-label classification task because pictures of everyday life are inherently multi-label. Multi-label classification is a more difficult task than single-label classification because both the input images and output label spaces are more complex. Furthermore, collecting clean multi-label annotations is more difficult to scale-up than single-label annotations. To reduce the annotation cost, we propose to train a model with partial labels i.e. only some labels are known per image. We first empirically compare different labeling strategies to show the potential for using partial labels on multi-label datasets. Then to learn with partial labels, we introduce a new classification loss that exploits the proportion of known labels per example. Our approach allows the use of the same training settings as when learning with all the annotations. We further explore several curriculum learning based strategies to predict missing labels. Experiments are performed on three large-scale multi-label datasets: MS COCO, NUS-WIDE and Open Images.


3D目标检测

[6] CVPR2019 3D detection新文

论文题目:Stereo R-CNN based 3D Object Detection for Autonomous Driving

作者:Peiliang Li, Xiaozhi Chen, Shaojie Shen

论文链接:https://arxiv.org/abs/1902.09738 

摘要:We propose a 3D object detection method for autonomous driving by fully exploiting the sparse and dense, semantic and geometry information in stereo imagery. Our method, called Stereo R-CNN, extends Faster R-CNN for stereo inputs to simultaneously detect and associate object in left and right images. We add extra branches after stereo Region Proposal Network (RPN) to predict sparse keypoints, viewpoints, and object dimensions, which are combined with 2D left-right boxes to calculate a coarse 3D object bounding box. We then recover the accurate 3D bounding box by a region-based photometric alignment using left and right RoIs. Our method does not require depth input and 3D position supervision, however, outperforms all existing fully supervised image-based methods. Experiments on the challenging KITTI dataset show that our method outperforms the state-of-the-art stereo-based method by around 30% AP on both 3D detection and 3D localization tasks. Code will be made publicly available.


三维重建

[7] CVPR2019 3D Reconstruction新文

论文题目:Single-Image Piece-wise Planar 3D Reconstruction via Associative Embedding

作者:Zehao Yu, Jia Zheng, Dongze Lian, Zihan Zhou, Shenghua Gao

论文链接:https://arxiv.org/abs/1902.09777 

代码链接:https://github.com/svip-lab/PlanarReconstruction

摘要:Single-image piece-wise planar 3D reconstruction aims to simultaneously segment plane instances and recover 3D plane parameters from an image. Most recent approaches leverage convolutional neural networks (CNNs) and achieve promising results. However, these methods are limited to detecting a fixed number of planes with certain learned order. To tackle this problem, we propose a novel two-stage method based on associative embedding, inspired by its recent success in instance segmentation. In the first stage, we train a CNN to map each pixel to an embedding space where pixels from the same plane instance have similar embeddings. Then, the plane instances are obtained by grouping the embedding vectors in planar regions via an efficient mean shift clustering algorithm. In the second stage, we estimate the parameter for each plane instance by considering both pixel-level and instance-level consistencies. With the proposed method, we are able to detect an arbitrary number of planes. Extensive experiments on public datasets validate the effectiveness and efficiency of our method. Furthermore, our method runs at 30 fps at the testing time, thus could facilitate many real-time applications such as visual SLAM and human-robot interaction.


点云分割

[8] CVPR2019 点云分割新文

论文题目:Associatively Segmenting Instances and Semantics in Point Clouds

作者:Xinlong Wang, Shu Liu, Xiaoyong Shen, Chunhua Shen, Jiaya Jia

论文链接:https://arxiv.org/abs/1902.09852 

代码链接:https://github.com/WXinlong/ASIS 

摘要:A 3D point cloud describes the real scene precisely and intuitively.To date how to segment diversified elements in such an informative 3D scene is rarely discussed. In this paper, we first introduce a simple and flexible framework to segment instances and semantics in point clouds simultaneously. Then, we propose two approaches which make the two tasks take advantage of each other, leading to a win-win situation. Specifically, we make instance segmentation benefit from semantic segmentation through learning semantic-aware point-level instance embedding. Meanwhile, semantic features of the points belonging to the same instance are fused together to make more accurate per-point semantic predictions. Our method largely outperforms the state-of-the-art method in 3D instance segmentation along with a significant improvement in 3D semantic segmentation.


3D 人体姿态估计

[9] CVPR2019 3D人体姿态估计新文

论文题目:RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation

作者:Bastian Wandt, Bodo Rosenhahn

论文链接:https://arxiv.org/abs/1902.09868 

摘要:This paper addresses the problem of 3D human pose estimation from single images. While for a long time human skeletons were parameterized and fitted to the observation by satisfying a reprojection error, nowadays researchers directly use neural networks to infer the 3D pose from the observations. However, most of these approaches ignore the fact that a reprojection constraint has to be satisfied and are sensitive to overfitting. We tackle the overfitting problem by ignoring 2D to 3D correspondences. This efficiently avoids a simple memorization of the training data and allows for a weakly supervised training. One part of the proposed reprojection network (RepNet) learns a mapping from a distribution of 2D poses to a distribution of 3D poses using an adversarial training approach. Another part of the network estimates the camera. This allows for the definition of a network layer that performs the reprojection of the estimated 3D pose back to 2D which results in a reprojection loss function. Our experiments show that RepNet generalizes well to unknown data and outperforms state-of-the-art methods when applied to unseen data. Moreover, our implementation runs in real-time on a standard desktop PC.


3D 人脸

[10] CVPR2019 3D Face新文

论文题目:Disentangled Representation Learning for 3D Face Shape

作者:Zi-Hang Jiang, Qianyi Wu, Keyu Chen, Juyong Zhang

论文链接:https://arxiv.org/abs/1902.09887 

摘要:In this paper, we present a novel strategy to design disentangled 3D face shape representation. Specifically, a given 3D face shape is decomposed into identity part and expression part, which are both encoded and decoded in a nonlinear way. To solve this problem, we propose an attribute decomposition framework for 3D face mesh. To better represent face shapes which are usually nonlinear deformed between each other, the face shapes are represented by a vertex based deformation representation rather than Euclidean coordinates. The experimental results demonstrate that our method has better performance than existing methods on decomposing the identity and expression parts. Moreover, more natural expression transfer results can be achieved with our method than existing methods.


视频描述

[11] CVPR2019 Video Caption新文

论文题目:Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning

作者:Nayyer Aafaq, Naveed Akhtar, Wei Liu, Syed Zulqarnain Gilani, Ajmal Mian

论文链接:https://arxiv.org/abs/1902.10322 

摘要:Automatic generation of video captions is a fundamental challenge in computer vision. Recent techniques typically employ a combination of Convolutional Neural Networks (CNNs) and Recursive Neural Networks (RNNs) for video captioning. These methods mainly focus on tailoring sequence learning through RNNs for better caption generation, whereas off-the-shelf visual features are borrowed from CNNs. We argue that careful designing of visual features for this task is equally important, and present a visual feature encoding technique to generate semantically rich captions using Gated Recurrent Units (GRUs). Our method embeds rich temporal dynamics in visual features by hierarchically applying Short Fourier Transform to CNN features of the whole video. It additionally derives high level semantics from an object detector to enrich the representation with spatial dynamics of the detected objects. The final representation is projected to a compact space and fed to a language model. By learning a relatively simple language model comprising two GRU layers, we establish new state-of-the-art on MSVD and MSR-VTT datasets for METEOR and ROUGE_L metrics.


语义分割

[12] CVPR2019 弱监督语义分割新文

论文题目:FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference

作者:Jungbeom Lee, Eunji Kim, Sungmin Lee, Jangho Lee, Sungroh Yoon

论文链接:https://arxiv.org/abs/1902.10421 

摘要:The main obstacle to weakly supervised semantic image segmentation is the difficulty of obtaining pixel-level information from coarse image-level annotations. Most methods based on image-level annotations use localization maps obtained from the classifier, but these only focus on the small discriminative parts of objects and do not capture precise boundaries. FickleNet explores diverse combinations of locations on feature maps created by generic deep neural networks. It selects hidden units randomly and then uses them to obtain activation scores for image classification. FickleNet implicitly learns the coherence of each location in the feature maps, resulting in a localization map which identifies both discriminative and other parts of objects. The ensemble effects are obtained from a single network by selecting random hidden unit pairs, which means that a variety of localization maps are generated from a single image. Our approach does not require any additional training steps and only adds a simple layer to a standard convolutional neural network; nevertheless it outperforms recent comparable techniques on the Pascal VOC 2012 benchmark in both weakly and semi-supervised settings.


视频处理

[13] CVPR2019 视频处理新文

论文题目:Single-frame Regularization for Temporally Stable CNNs

作者:Gabriel Eilertsen, Rafał K. Mantiuk, Jonas Unger

论文链接:https://arxiv.org/abs/1902.10424 

摘要:Convolutional neural networks (CNNs) can model complicated non-linear relations between images. However, they are notoriously sensitive to small changes in the input. Most CNNs trained to describe image-to-image mappings generate temporally unstable results when applied to video sequences, leading to flickering artifacts and other inconsistencies over time. In order to use CNNs for video material, previous methods have relied on estimating dense frame-to-frame motion information (optical flow) in the training and/or the inference phase, or by exploring recurrent learning structures. We take a different approach to the problem, posing temporal stability as a regularization of the cost function. The regularization is formulated to account for different types of motion that can occur between frames, so that temporally stable CNNs can be trained without the need for video material or expensive motion estimation. The training can be performed as a fine-tuning operation, without architectural modifications of the CNN. Our evaluation shows that the training strategy leads to large improvements in temporal smoothness. Moreover, in situations where the quantity of training data is limited, the regularization can help in boosting the generalization performance to a much larger extent than what is possible with naïve augmentation strategies.


多视几何

[14] CVPR2019 多视几何新文

论文题目:Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference

作者:Yao Yao, Zixin Luo, Shiwei Li, Tianwei Shen, Tian Fang, Long Quan

论文链接:https://arxiv.org/abs/1902.10556 

代码链接:https://github.com/YoYo000/MVSNet

摘要:Deep learning has recently demonstrated its excellent performance for multi-view stereo (MVS). However, one major limitation of current learned MVS approaches is the scalability: the memory-consuming cost volume regularization makes the learned MVS hard to be applied to high-resolution scenes. In this paper, we introduce a scalable multi-view stereo framework based on the recurrent neural network. Instead of regularizing the entire 3D cost volume in one go, the proposed Recurrent Multi-view Stereo Network (R-MVSNet) sequentially regularizes the 2D cost maps along the depth direction via the gated recurrent unit (GRU). This reduces dramatically the memory consumption and makes high-resolution reconstruction feasible. We first show the state-of-the-art performance achieved by the proposed R-MVSNet on the recent MVS benchmarks. Then, we further demonstrate the scalability of the proposed method on several large-scale scenarios, where previous learned approaches often fail due to the memory constraint. 


视频分类

[15] CVPR2019 Video Classification新文

论文题目:Efficient Video Classification Using Fewer Frames

作者:Shweta Bhardwaj, Mukundhan Srinivasan, Mitesh M. Khapra

论文链接:https://arxiv.org/abs/1902.10640 

摘要:Recently,there has been a lot of interest in building compact models for video classification which have a small memory footprint (<1 GB). While these models are compact, they typically operate by repeated application of a small weight matrix to all the frames in a video. E.g. recurrent neural network based methods compute a hidden state for every frame of the video using a recurrent weight matrix. Similarly, cluster-and-aggregate based methods such as NetVLAD, have a learnable clustering matrix which is used to assign soft-clusters to every frame in the video. Since these models look at every frame in the video, the number of floating point operations (FLOPs) is still large even though the memory footprint is small. We focus on building compute-efficient video classification models which process fewer frames and hence have less number of FLOPs. Similar to memory efficient models, we use the idea of distillation albeit in a different setting. Specifically, in our case, a compute-heavy teacher which looks at all the frames in the video is used to train a compute-efficient student which looks at only a small fraction of frames in the video. This is in contrast to a typical memory efficient Teacher-Student setting, wherein both the teacher and the student look at all the frames in the video but the student has fewer parameters. Our work thus complements the research on memory efficient video classification. We do an extensive evaluation with three types of models for video classification,viz.(i) recurrent models (ii) cluster-and-aggregate models and (iii) memory-efficient cluster-and-aggregate models and show that in each of these cases, a see-it-all teacher can be used to train a compute efficient see-very-little student. We show that the proposed student network can reduce the inference time by 30% and the number of FLOPs by approximately 90% with a negligible drop in the performance.


本文来自微信公众号“CVer”(微信号:CVerNews),知乎专栏地址:https://zhuanlan.zhihu.com/p/59987434

特别鸣谢 CV_arXiv_Daily 公众号提供的素材,本文介绍的论文已经同步至:https://github.com/zhengzhugithub/CV-arXiv-Daily

邀请你参加【AI研习社 · CVPR 2019 顶会赞助计划】

通过参与 CVPR 小组的各类活动,如发布泡泡、笔记、帖子和小组成员交流讨论来获得相对应的「研值」,在小组内研值积累排行第一,即有可能获得赞助计划名额。

获得赞助名额的小伙伴,将由 AI 研习社提供往返机票+酒店住宿+注册费用,和 AI 科技评论的记者同行启动赴美之旅,让你无后顾之忧地跟大咖直接面对面!

扫码即刻参与

点击
阅读原文
,查看更多内容
登录查看更多
14

相关内容

CVPR是IEEE Conference on Computer Vision and Pattern Recognition的缩写,即IEEE国际计算机视觉与模式识别会议。该会议是由IEEE举办的计算机视觉和模式识别领域的顶级会议。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

目标检测

  1. 综述:深度域适应目标检测 标题:Deep Domain Adaptive Object Detection: a Survey 作者:Wanyi Li, Peng Wang 链接:https://arxiv.org/abs/2002.06797

本文共梳理了40篇相关文献,由中科院自动化所学者发布。基于深度学习(DL)的目标检测已经取得了很大的进展,这些方法通常假设有大量的带标签的训练数据可用,并且训练和测试数据从相同的分布中提取。然而,这两个假设在实践中并不总是成立的。深域自适应目标检测(DDAOD)作为一种新的学习范式应运而生。本文综述了深域自适应目标检测方法的研究进展。

  1. 深度学习中的异常实例检测:综述 标题:Anomalous Instance Detection in Deep Learning: A Survey 作者:Saikiran Bulusu, Dawn Song 链接:https://arxiv.org/abs/2003.06979

本文共梳理了119篇相关文献,由雪城大学学者发布。讨论多种异常实例检测方法,并分析了各种方法的相对优势和劣势。

  1. 使用移动摄像机检测移动物体:全面综述 标题:Moving Objects Detection with a Moving Camera: A Comprehensive Review 作者:Marie-Neige Chapel, Thierry Bouwmans 链接:https://arxiv.org/abs/2001.05238

本文共梳理了347篇相关文献。随着移动传感器的兴起,研究移动相机逐渐变为热门方向。本文对不同现有方法进行了识别,并将其分为一个平面或多个两类。在这两个类别中,将各类方法分为8组:全景背景减法,双摄像头,运动补偿,子空间分割,运动分割,平面+视差,多平面和按块分割图像。本文还对公开可用的数据集和评估指标进行了研究。

图像分类

  1. 图像分类中的半监督,自我监督和无监督技术综述 标题:A survey on Semi-, Self- and Unsupervised Techniques in Image Classification 作者:Lars Schmarje, Reinhard Koch 链接:https://arxiv.org/abs/2002.08721

本文共梳理了51篇相关文献。综述了标签较少的图像分类中常用的21种技术和方法。我们比较方法,并确定了三个主要趋势。

图像去噪

  1. 图像去噪深度学习:综述 标题:Deep Learning on Image Denoising: An overview 作者:Chunwei Tian, Chia-Wen Lin 链接:https://arxiv.org/abs/1912.13171

本文梳理了238篇相关文献,由哈尔滨工业大学、广东工业大学、清华大学学者共同发布。不同类型的处理噪声深度学习方法存在巨大差异,而目前很少有相关研究来进行相关总结。本文对图像去噪中不同深度学习技术进行了比较研究,分析不同方法的动机和原理,并在公共去噪数据集进行比较。研究包括:(1). 加白噪声图像的CNN;(2)用于真实噪声图像的CNN;(3)用于盲噪声去噪的CNN;(4)用于混合噪声图像的CNN。

图像分割

  1. 使用深度学习进行图像分割:综述 标题:Image Segmentation Using Deep Learning: A Survey 作者:Shervin Minaee, Demetri Terzopoulos 链接:https://arxiv.org/abs/2001.05566

本文梳理了172篇相关文献,对语义和实例分割文献进行了全面回顾,涵盖了的各种开创性作品,包括全卷积像素标记网络,编码器-解码器体系结构,多尺度以及基于金字塔的方法,递归网络,视觉注意模型以及对抗中的生成模型。

人脸识别

  1. DeepFakes:面部操纵和伪造检测综述 标题:DeepFakes and Beyond: A Survey of Face Manipulation and Fake Detection 作者:Ruben Tolosana, Javier Ortega-Garcia 链接:https://arxiv.org/abs/2001.00179

本文梳理了105篇相关文献,本文对操纵人脸的图像技术(包括DeepFake方法)以及检测此类技术的方法进行了全面综述。论述了四种类型的面部操作:全脸合成、面部身份交换(DeepFakes)、面部属性操作以及面部表情操作。

姿态估计

  1. 目标姿态回顾:从3D边界框检测器到完整的6D姿态估计器 标题:A Review on Object Pose Recovery: from 3D Bounding Box Detectors to Full 6D Pose Estimators 作者:Caner Sahin, Tae-Kyun Kim 链接:https://arxiv.org/abs/2001.10609

本文梳理了206篇相关文献,由伦敦帝国理工学院学者发布。本文对3D边界框检测器到完整的6D姿态估计器的物体姿态恢复方法的进行了首次全面的综述。基于数学模型,将各类方法分为分类,回归,分类与回归,模板匹配和点对特征匹配任务。

行为/动作识别

  1. 基于3D骨架的动作识别学习方法的研究 标题:A Survey on 3D Skeleton-Based Action Recognition Using Learning Method 作者:Bin Ren, Hong Liu 链接:https://arxiv.org/abs/2002.05907

本文梳理了81篇相关文献,由北京大学学者发布。本文强调了动作识别的必要性和3D骨架数据的重要性,然后以数据驱动的方式对基于递归神经网络,基于卷积神经网络和基于图卷积网络的主流动作识别技术进行了全面介绍,这也是第一次对使用3D骨架数据进行基于深度学习的动作识别的全面研究。

人群计数

  1. 基于CNN的密度估算和人群计数:综述 标题:CNN-based Density Estimation and Crowd Counting: A Survey 作者:Guangshuai Gao, Yunhong Wang 链接:https://arxiv.org/abs/2003.12783

本文梳理了222篇相关文献,由北京航空航天大学学者发布,基于CNN的密度图估计方法,调研了220+工作,对人群计数进行了全面系统的研究。同时根据评估指标,在人群统计数据集上选择表现最好的三名,并分析其优缺点。

医学影像

  1. 使用经典和深层神经网络进行的乳房组织病理学图像分析的全面综述 标题:A Comprehensive Review for Breast Histopathology Image Analysis Using Classical and Deep Neural Networks 作者:Xiaomin Zhou, Tao Jiang 链接:https://arxiv.org/abs/2003.12255

本文梳理了180篇相关文献,由东北大学学者发布。对基于人工神经网络的BHIA技术进行了全面概述,将BHIA系统分为经典和深度神经网络以进行深入研究,分析现有模型以发现最合适的算法,并提供可公开访问的数据集。

  1. 使用深度神经网络的医学图像配准:全面综述 标题:Medical Image Registration Using Deep Neural Networks: A Comprehensive Review 作者:Hamid Reza Boveiri, Ali Reza MehdiZadeh 链接:https://arxiv.org/abs/2002.03401

本文梳理了117篇相关文献,对使用深度神经网络进行医学图像配准的最新文献进行了全面回顾,系统地涵盖了该领域的相关作品,包括关键概念,统计分析,关键技术,主要贡献,挑战和未来方向。

  1. 迈向自动威胁检测:X射线安全成像中深度学习进展综述 标题:Towards Automatic Threat Detection: A Survey of Advances of Deep Learning within X-ray Security Imaging 作者:Samet Akcay, Toby Breckon 链接:https://arxiv.org/abs/2001.01293

本文梳理了151篇相关文献,由英国杜伦大学学者发布。本文分常规机器学习和当代深度学习两类来回顾X射线安全成像算法。将深度学习方法分为有监督,半监督和无监督学习,着重论述分类,检测,分割和异常检测任务,同时包含有完善的X射线数据集。

  1. 用于计算组织病理学的深度神经网络模型综述 标题:Deep neural network models for computational histopathology: A survey 作者:Chetan L. Srinidhi, Anne L. Martel 链接:https://arxiv.org/abs/1912.12378

本文梳理了130篇相关文献,由多伦多大学学者发布。本文对组织病理学图像分析中使用的最新深度学习方法进行了全面回顾,包括有监督,弱监督,无监督,迁移学习等领域,并总结了几个现有的开放数据集。

三维重建

  1. 外部形状对3D内部结构预测综述 标题:A Survey On 3D Inner Structure Prediction from its Outer Shape 作者:Mohamed Mejri, Cédric Pradalier 链接:https://arxiv.org/abs/2002.04571

本文梳理了81篇相关文献,由北京大学学者发布。由于过去与骨架数据相关内容很少,本文是第一篇针对使用3D骨架数据进行基于深度学习的动作识别进行全面讨论的研究。本文突出了动作识别和3D骨架数据的重要性,以数据驱动的方式对基于递归神经网络、卷积神经网络和图卷积网络的主流动作识别技术进行了全面介绍。并介绍了最大的3D骨架数据集NTU-RGB+D及其新版本NTU-RGB+D 120,并论述了几种现有的顶级算法。

三维点云

  1. 点云的无目标配准综述 标题:Target-less registration of point clouds: A review 作者:Yue Pan

本文对48篇文献进行了梳理,总结了无目标点云配准的基本工作,回顾了三种常用的配准方法,即基于特征匹配的方法,迭代最近点算法和随机假设,并分析了这些方法的优缺点,介绍它们的常见应用场景。 链接:https://arxiv.org/abs/1912.12756

OCR:

  1. 手写光学字符识别(OCR):综合系统文献综述(SLR) 标题:Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR) 作者:Jamshed Memon, Rizwan Ahmed Khan 链接:https://arxiv.org/abs/2001.00139

本文对142篇相关文献进行了梳理,总结了有关OCR的研究,综述了2000年至2018年之间发布的研究文章,介绍OCR的最新结果和技术,并分析研究差距,以总结研究方向。

深度depth相关:

  1. 基于深度学习的单目深度估计:综述 标题:Monocular Depth Estimation Based On Deep Learning: An Overview 作者:Chaoqiang Zhao, Feng Qian 链接:https://arxiv.org/abs/2003.06620

本文对119篇相关文献进行了梳理,由华东理工大学学者发布。随着深度神经网络的迅速发展,基于深度学习的单眼深度估计已得到广泛研究。为了提高深度估计的准确性,提出了各种网络框架,损失函数和训练策略。因此,本文综述了当前基于深度学习的单眼深度估计方法,总结了几种基于深度学习的深度估计中广泛使用的数据集和评价指标,同时根据不同的训练方式回顾了一些有代表性的现有方法:有监督,无监督和半监督。

CNN

  1. 卷积神经网络的概述论文:分析、应用和展望 标题:A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects 作者:Zewen Li, Wenjie Yang, Shouheng Peng, Fan Liu 链接:https://arxiv.org/abs/2004.02806

本文对119篇相关文献进行了梳理,由华东理工大学学者发布。本文旨在在卷积神经网络这个快速增长的领域中尽可能提供新颖的想法和前景,不仅涉及二维卷积,而且涉及一维和多维卷积。首先,本文简要介绍了CNN的历史并概述了CNN发展,介绍经典CNN模型,重点论述使它们达到SOTA的关键因素,并通过实验分析提供了一些经验法则,最后对一维,二维和多维卷积的应用进行了概述。

视觉常识/其他

  1. 神经网络分类器的信息平面分析研究述评 标题:On Information Plane Analyses of Neural Network Classifiers -- A Review 作者:Bernhard C. Geiger 链接:https://arxiv.org/abs/2003.09671

  2. 低功耗深度学习和计算机视觉方法的概述 标题:A Survey of Methods for Low-Power Deep Learning and Computer Vision 作者:Abhinav Goel, George K. Thiruvathukal 链接:https://arxiv.org/abs/2003.11066

  3. 深度学习遇到数据对齐时:深度注册网络(DRN)评述 标题:When Deep Learning Meets Data Alignment: A Review on Deep Registration Networks (DRNs) 作者:Victor Villena-Martinez, Robert B. Fisher 链接:https://arxiv.org/abs/2003.03167

  4. 面向消费设备的无限制掌纹识别:文献综述 标题:Towards Unconstrained Palmprint Recognition on Consumer Devices: a Literature Review 作者:Adrian-S. Ungureanu, Peter Corcoran 链接:https://arxiv.org/abs/2003.00737

  5. 基于地面纹理的本地化功能-综述 标题:Features for Ground Texture Based Localization -- A Survey 作者:Jan Fabian Schmid, Rudolf Mester 链接:https://arxiv.org/abs/2002.11948

  6. 从观看到移动:视觉室内导航(VIN)学习综述 标题:From Seeing to Moving: A Survey on Learning for Visual Indoor Navigation (VIN) 作者:Xin Ye, Yezhou Yang 链接:https://arxiv.org/abs/2002.11310

成为VIP会员查看完整内容
0
87

This is an official pytorch implementation of Deep High-Resolution Representation Learning for Human Pose Estimation. In this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process. We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. We empirically demonstrate the effectiveness of our network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset. The code and models have been publicly available at \url{https://github.com/leoxiaobin/deep-high-resolution-net.pytorch}.

0
5
下载
预览
小贴士
相关资讯
相关论文
Simple Multi-Resolution Representation Learning for Human Pose Estimation
Trung Q. Tran,Giang V. Nguyen,Daeyoung Kim
5+阅读 · 2020年4月14日
Interpretable CNNs for Object Classification
Quanshi Zhang,Xin Wang,Ying Nian Wu,Huilin Zhou,Song-Chun Zhu
17+阅读 · 2020年3月12日
Ke Sun,Bin Xiao,Dong Liu,Jingdong Wang
5+阅读 · 2019年2月25日
Sudeep Pillai,Rares Ambrus,Adrien Gaidon
5+阅读 · 2018年10月3日
Rakesh Mehta,Cemalettin Ozturk
5+阅读 · 2018年5月16日
Hao Wang,Qilong Wang,Mingqi Gao,Peihua Li,Wangmeng Zuo
5+阅读 · 2018年4月2日
Qianhui Luo,Huifang Ma,Yue Wang,Li Tang,Rong Xiong
8+阅读 · 2018年2月21日
Navaneeth Bodla,Gang Hua,Rama Chellappa
8+阅读 · 2018年1月17日
Mark Sandler,Andrew Howard,Menglong Zhu,Andrey Zhmoginov,Liang-Chieh Chen
9+阅读 · 2018年1月16日
Top