题目: Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

摘要: 我们解决了计算机视觉的一个核心问题:图像匹配的二维特征点的检测和描述。在很长一段时间里,手工制作的设计,比如SIFT筛选算法,在准确性和效率上都是无与伦比的。近年来出现了利用神经网络实现检测和描述的学习型特征检测器。训练这些网络通常采用优化低的匹配分数的方法,通常预先定义一组图像补丁,哪些应该匹配,哪些不应该匹配,哪些不应该包含关键点。然而,提高这些低的匹配分数的准确性并不一定意味着在难度大的视觉任务中有更好的表现。我们提出了一种新的训练方法,该方法将特征检测器嵌入到完整的视觉管道中,并以端到端的方式对可学习参数进行训练。我们利用强化学习的原理克服了关键点选择和描述符匹配的离散性。作为一个例子,我们解决了一对图像之间的相对姿态估计问题。我们证明了一个基于状态学习的特征检测器的准确性可以在测试时通过训练来完成它应该解决的任务。我们的训练方法对学习任务几乎没有限制,适用于任何能够预测关键点热图和关键点位置描述符的体系结构。

成为VIP会员查看完整内容
0
39

相关内容

CVPR是IEEE Conference on Computer Vision and Pattern Recognition的缩写,即IEEE国际计算机视觉与模式识别会议。该会议是由IEEE举办的计算机视觉和模式识别领域的顶级会议。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

Human pose estimation - the process of recognizing human keypoints in a given image - is one of the most important tasks in computer vision and has a wide range of applications including movement diagnostics, surveillance, or self-driving vehicle. The accuracy of human keypoint prediction is increasingly improved thanks to the burgeoning development of deep learning. Most existing methods solved human pose estimation by generating heatmaps in which the ith heatmap indicates the location confidence of the ith keypoint. In this paper, we introduce novel network structures referred to as multiresolution representation learning for human keypoint prediction. At different resolutions in the learning process, our networks branch off and use extra layers to learn heatmap generation. We firstly consider the architectures for generating the multiresolution heatmaps after obtaining the lowest-resolution feature maps. Our second approach allows learning during the process of feature extraction in which the heatmaps are generated at each resolution of the feature extractor. The first and second approaches are referred to as multi-resolution heatmap learning and multi-resolution feature map learning respectively. Our architectures are simple yet effective, achieving good performance. We conducted experiments on two common benchmarks for human pose estimation: MS-COCO and MPII dataset.

0
5
下载
预览

主题: Weakly-Supervised Salient Object Detection via Scribble Annotations

摘要: 与费力的逐像素密集标记相比,这种方法更容易通过涂抹来标记数据,仅花费1-2秒即可标记一张图像。然而,尚未有人探索使用可划线标签来学习显着物体检测。在本文中,我们提出了一种弱监督的显着物体检测模型,以从此类注释中学习显着性。为此,我们首先使用乱码对现有的大型显着物体检测数据集进行重新标记,即S-DUTS数据集。由于对象的结构和详细信息不能通过乱写识别,因此直接训练带有乱写的标签将导致边界位置局限性的显着性图。为了缓解这个问题,我们提出了一个辅助的边缘检测任务来明确地定位对象边缘,并提出了门控结构感知损失以将约束置于要恢复的结构范围上。此外,我们设计了一种涂鸦增强方案来迭代地整合我们的涂鸦注释,然后将其作为监督来学习高质量的显着性图。我们提出了一种新的度量标准,称为显着性结构测量,用于测量预测显着性图的结构对齐方式,这与人类的感知更加一致。在六个基准数据集上进行的大量实验表明,我们的方法不仅优于现有的弱监督/无监督方法,而且与几种完全监督的最新模型相提并论。

成为VIP会员查看完整内容
0
32

This paper focuses on two fundamental tasks of graph analysis: community detection and node representation learning, which capture the global and local structures of graphs, respectively. In the current literature, these two tasks are usually independently studied while they are actually highly correlated. We propose a probabilistic generative model called vGraph to learn community membership and node representation collaboratively. Specifically, we assume that each node can be represented as a mixture of communities, and each community is defined as a multinomial distribution over nodes. Both the mixing coefficients and the community distribution are parameterized by the low-dimensional representations of the nodes and communities. We designed an effective variational inference algorithm which regularizes the community membership of neighboring nodes to be similar in the latent space. Experimental results on multiple real-world graphs show that vGraph is very effective in both community detection and node representation learning, outperforming many competitive baselines in both tasks. We show that the framework of vGraph is quite flexible and can be easily extended to detect hierarchical communities.

0
13
下载
预览

Despite huge success in the image domain, modern detection models such as Faster R-CNN have not been used nearly as much for video analysis. This is arguably due to the fact that detection models are designed to operate on single frames and as a result do not have a mechanism for learning motion representations directly from video. We propose a learning procedure that allows detection models such as Faster R-CNN to learn motion features directly from the RGB video data while being optimized with respect to a pose estimation task. Given a pair of video frames---Frame A and Frame B---we force our model to predict human pose in Frame A using the features from Frame B. We do so by leveraging deformable convolutions across space and time. Our network learns to spatially sample features from Frame B in order to maximize pose detection accuracy in Frame A. This naturally encourages our network to learn motion offsets encoding the spatial correspondences between the two frames. We refer to these motion offsets as DiMoFs (Discriminative Motion Features). In our experiments we show that our training scheme helps learn effective motion cues, which can be used to estimate and localize salient human motion. Furthermore, we demonstrate that as a byproduct, our model also learns features that lead to improved pose detection in still-images, and better keypoint tracking. Finally, we show how to leverage our learned model for the tasks of spatiotemporal action localization and fine-grained action recognition.

0
3
下载
预览

While most steps in the modern object detection methods are learnable, the region feature extraction step remains largely hand-crafted, featured by RoI pooling methods. This work proposes a general viewpoint that unifies existing region feature extraction methods and a novel method that is end-to-end learnable. The proposed method removes most heuristic choices and outperforms its RoI pooling counterparts. It moves further towards fully learnable object detection.

0
4
下载
预览
小贴士
相关VIP内容
相关资讯
【泡泡图灵智库】Detect-SLAM:目标检测和SLAM相互收益
泡泡机器人SLAM
9+阅读 · 2019年6月28日
【泡泡图灵智库】鲁邦的多层次大范围定位算法(CVPR)
【泡泡图灵智库】基于几何一致性网络的摄像机运动估计
论文笔记之Feature Selective Networks for Object Detection
统计学习与视觉计算组
18+阅读 · 2018年7月26日
Focal Loss for Dense Object Detection
统计学习与视觉计算组
11+阅读 · 2018年3月15日
相关论文
Simple Multi-Resolution Representation Learning for Human Pose Estimation
Trung Q. Tran,Giang V. Nguyen,Daeyoung Kim
5+阅读 · 2020年4月14日
Zixin Luo,Lei Zhou,Xuyang Bai,Hongkai Chen,Jiahui Zhang,Yao Yao,Shiwei Li,Tian Fang,Long Quan
6+阅读 · 2020年3月23日
vGraph: A Generative Model for Joint Community Detection and Node Representation Learning
Fan-Yun Sun,Meng Qu,Jordan Hoffmann,Chin-Wei Huang,Jian Tang
13+阅读 · 2019年9月17日
Learning Discriminative Motion Features Through Detection
Gedas Bertasius,Christoph Feichtenhofer,Du Tran,Jianbo Shi,Lorenzo Torresani
3+阅读 · 2018年12月11日
Bingyi Kang,Zhuang Liu,Xin Wang,Fisher Yu,Jiashi Feng,Trevor Darrell
7+阅读 · 2018年12月5日
Polarity Loss for Zero-shot Object Detection
Shafin Rahman,Salman Khan,Nick Barnes
3+阅读 · 2018年11月22日
Multi-task Deep Reinforcement Learning with PopArt
Matteo Hessel,Hubert Soyer,Lasse Espeholt,Wojciech Czarnecki,Simon Schmitt,Hado van Hasselt
3+阅读 · 2018年9月12日
Han Hu,Jiayuan Gu,Zheng Zhang,Jifeng Dai,Yichen Wei
3+阅读 · 2018年6月14日
Jiayuan Gu,Han Hu,Liwei Wang,Yichen Wei,Jifeng Dai
4+阅读 · 2018年3月19日
Alexander Wong,Mohammad Javad Shafiee,Francis Li,Brendan Chwyl
7+阅读 · 2018年2月19日
Top
微信扫码咨询专知VIP会员