The way that information propagates in neural networks is of great importance. In this paper, we propose Path Aggregation Network (PANet) aiming at boosting information flow in proposal-based instance segmentation framework. Specifically, we enhance the entire feature hierarchy with accurate localization signals in lower layers by bottom-up path augmentation, which shortens the information path between lower layers and topmost feature. We present adaptive feature pooling, which links feature grid and all feature levels to make useful information in each feature level propagate directly to following proposal subnetworks. A complementary branch capturing different views for each proposal is created to further improve mask prediction. These improvements are simple to implement, with subtle extra computational overhead. Our PANet reaches the 1st place in the COCO 2017 Challenge Instance Segmentation task and the 2nd place in Object Detection task without large-batch training. It is also state-of-the-art on MVD and Cityscapes.

3
下载
关闭预览

相关内容

《计算机信息》杂志发表高质量的论文,扩大了运筹学和计算的范围,寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文,以及描述新的和有用的软件工具的论文。官网链接:https://pubsonline.informs.org/journal/ijoc

We consider an interesting problem-salient instance segmentation in this paper. Other than producing bounding boxes, our network also outputs high-quality instance-level segments. Taking into account the category-independent property of each target, we design a single stage salient instance segmentation framework, with a novel segmentation branch. Our new branch regards not only local context inside each detection window but also its surrounding context, enabling us to distinguish the instances in the same scope even with obstruction. Our network is end-to-end trainable and runs at a fast speed (40 fps when processing an image with resolution 320x320). We evaluate our approach on a publicly available benchmark and show that it outperforms other alternative solutions. We also provide a thorough analysis of the design choices to help readers better understand the functions of each part of our network. The source code can be found at \url{https://github.com/RuochenFan/S4Net}.

0
7
下载
预览

In this paper, we propose a unified panoptic segmentation network (UPSNet) for tackling the newly proposed panoptic segmentation task. On top of a single backbone residual network, we first design a deformable convolution based semantic segmentation head and a Mask R-CNN style instance segmentation head which solve these two subtasks simultaneously. More importantly, we introduce a parameter-free panoptic head which solves the panoptic segmentation via pixel-wise classification. It first leverages the logits from the previous two heads and then innovatively expands the representation for enabling prediction of an extra unknown class which helps better resolve the conflicts between semantic and instance segmentation. Additionally, it handles the challenge caused by the varying number of instances and permits back propagation to the bottom modules in an end-to-end manner. Extensive experimental results on Cityscapes, COCO and our internal dataset demonstrate that our UPSNet achieves state-of-the-art performance with much faster inference.

0
3
下载
预览

We present an end-to-end method for the task of panoptic segmentation. The method makes instance segmentation and semantic segmentation predictions in a single network, and combines these outputs using heuristics to create a single panoptic segmentation output. The architecture consists of a ResNet-50 feature extractor shared by the semantic segmentation and instance segmentation branch. For instance segmentation, a Mask R-CNN type of architecture is used, while the semantic segmentation branch is augmented with a Pyramid Pooling Module. Results for this method are submitted to the COCO and Mapillary Joint Recognition Challenge 2018. Our approach achieves a PQ score of 17.6 on the Mapillary Vistas validation set and 27.2 on the COCO test-dev set.

0
4
下载
预览

Semantic segmentation requires both rich spatial information and sizeable receptive field. However, modern approaches usually compromise spatial resolution to achieve real-time inference speed, which leads to poor performance. In this paper, we address this dilemma with a novel Bilateral Segmentation Network (BiSeNet). We first design a Spatial Path with a small stride to preserve the spatial information and generate high-resolution features. Meanwhile, a Context Path with a fast downsampling strategy is employed to obtain sufficient receptive field. On top of the two paths, we introduce a new Feature Fusion Module to combine features efficiently. The proposed architecture makes a right balance between the speed and segmentation performance on Cityscapes, CamVid, and COCO-Stuff datasets. Specifically, for a 2048x1024 input, we achieve 68.4% Mean IOU on the Cityscapes test dataset with speed of 105 FPS on one NVIDIA Titan XP card, which is significantly faster than the existing methods with comparable performance.

0
4
下载
预览

Weakly-supervised semantic segmentation under image tags supervision is a challenging task as it directly associates high-level semantic to low-level appearance. To bridge this gap, in this paper, we propose an iterative bottom-up and top-down framework which alternatively expands object regions and optimizes segmentation network. We start from initial localization produced by classification networks. While classification networks are only responsive to small and coarse discriminative object regions, we argue that, these regions contain significant common features about objects. So in the bottom-up step, we mine common object features from the initial localization and expand object regions with the mined features. To supplement non-discriminative regions, saliency maps are then considered under Bayesian framework to refine the object regions. Then in the top-down step, the refined object regions are used as supervision to train the segmentation network and to predict object masks. These object masks provide more accurate localization and contain more regions of object. Further, we take these object masks as initial localization and mine common object features from them. These processes are conducted iteratively to progressively produce fine object masks and optimize segmentation networks. Experimental results on Pascal VOC 2012 dataset demonstrate that the proposed method outperforms previous state-of-the-art methods by a large margin.

0
3
下载
预览

Weakly supervised instance segmentation with image-level labels, instead of expensive pixel-level masks, remains unexplored. In this paper, we tackle this challenging problem by exploiting class peak responses to enable a classification network for instance mask extraction. With image labels supervision only, CNN classifiers in a fully convolutional manner can produce class response maps, which specify classification confidence at each image location. We observed that local maximums, i.e., peaks, in a class response map typically correspond to strong visual cues residing inside each instance. Motivated by this, we first design a process to stimulate peaks to emerge from a class response map. The emerged peaks are then back-propagated and effectively mapped to highly informative regions of each object instance, such as instance boundaries. We refer to the above maps generated from class peak responses as Peak Response Maps (PRMs). PRMs provide a fine-detailed instance-level representation, which allows instance masks to be extracted even with some off-the-shelf methods. To the best of our knowledge, we for the first time report results for the challenging image-level supervised instance segmentation task. Extensive experiments show that our method also boosts weakly supervised pointwise localization as well as semantic segmentation performance, and reports state-of-the-art results on popular benchmarks, including PASCAL VOC 2012 and MS COCO.

0
3
下载
预览

In this paper, we propose a simple but effective semantic-based aggregation (SBA) method. The proposed SBA utilizes the discriminative filters of deep convolutional layers as semantic detectors. Moreover, we propose the effective unsupervised strategy to select some semantic detectors to generate the "probabilistic proposals", which highlight certain discriminative pattern of objects and suppress the noise of background. The final global SBA representation could then be acquired by aggregating the regional representations weighted by the selected "probabilistic proposals" corresponding to various semantic content. Our unsupervised SBA is easy to generalize and achieves excellent performance on various tasks. We conduct comprehensive experiments and show that our unsupervised SBA outperforms the state-of-the-art unsupervised and supervised aggregation methods on image retrieval, place recognition and cloud classification.

0
8
下载
预览

This work presents a region-growing image segmentation approach based on superpixel decomposition. From an initial contour-constrained over-segmentation of the input image, the image segmentation is achieved by iteratively merging similar superpixels into regions. This approach raises two key issues: (1) how to compute the similarity between superpixels in order to perform accurate merging and (2) in which order those superpixels must be merged together. In this perspective, we firstly introduce a robust adaptive multi-scale superpixel similarity in which region comparisons are made both at content and common border level. Secondly, we propose a global merging strategy to efficiently guide the region merging process. Such strategy uses an adpative merging criterion to ensure that best region aggregations are given highest priorities. This allows to reach a final segmentation into consistent regions with strong boundary adherence. We perform experiments on the BSDS500 image dataset to highlight to which extent our method compares favorably against other well-known image segmentation algorithms. The obtained results demonstrate the promising potential of the proposed approach.

0
4
下载
预览

Semantic image segmentation is one of the most challenged tasks in computer vision. In this paper, we propose a highly fused convolutional network, which consists of three parts: feature downsampling, combined feature upsampling and multiple predictions. We adopt a strategy of multiple steps of upsampling and combined feature maps in pooling layers with its corresponding unpooling layers. Then we bring out multiple pre-outputs, each pre-output is generated from an unpooling layer by one-step upsampling. Finally, we concatenate these pre-outputs to get the final output. As a result, our proposed network makes highly use of the feature information by fusing and reusing feature maps. In addition, when training our model, we add multiple soft cost functions on pre-outputs and final outputs. In this way, we can reduce the loss reduction when the loss is back propagated. We evaluate our model on three major segmentation datasets: CamVid, PASCAL VOC and ADE20K. We achieve a state-of-the-art performance on CamVid dataset, as well as considerable improvements on PASCAL VOC dataset and ADE20K dataset

0
3
下载
预览

Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build "fully convolutional" networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a novel architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes one third of a second for a typical image.

0
3
下载
预览
小贴士
相关论文
S4Net: Single Stage Salient-Instance Segmentation
Ruochen Fan,Ming-Ming Cheng,Qibin Hou,Tai-Jiang Mu,Jingdong Wang,Shi-Min Hu
7+阅读 · 2019年4月10日
UPSNet: A Unified Panoptic Segmentation Network
Yuwen Xiong,Renjie Liao,Hengshuang Zhao,Rui Hu,Min Bai,Ersin Yumer,Raquel Urtasun
3+阅读 · 2019年1月12日
Panoptic Segmentation with a Joint Semantic and Instance Segmentation Network
Daan de Geus,Panagiotis Meletis,Gijs Dubbelman
4+阅读 · 2018年9月6日
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation
Changqian Yu,Jingbo Wang,Chao Peng,Changxin Gao,Gang Yu,Nong Sang
4+阅读 · 2018年8月2日
Xiang Wang,Shaodi You,Xi Li,Huimin Ma
3+阅读 · 2018年6月12日
Yanzhao Zhou,Yi Zhu,Qixiang Ye,Qiang Qiu,Jianbin Jiao
3+阅读 · 2018年4月3日
Jian Xu,Chunheng Wang,Chengzuo Qi,Cunzhao Shi,Baihua Xiao
8+阅读 · 2018年4月3日
Mahaman Sani Chaibou,Pierre-Henri Conze,Karim Kalti,Basel Solaiman,Mohamed Ali Mahjoub
4+阅读 · 2018年3月17日
Tao Yang,Yan Wu,Junqiao Zhao,Linting Guan
3+阅读 · 2018年1月4日
Jonathan Long,Evan Shelhamer,Trevor Darrell
3+阅读 · 2015年3月8日
相关VIP内容
相关资讯
内涵网络嵌入:Content-rich Network Embedding
我爱读PAMI
4+阅读 · 2019年11月5日
Hierarchically Structured Meta-learning
CreateAMind
9+阅读 · 2019年5月22日
TorchSeg:基于pytorch的语义分割算法开源了
极市平台
19+阅读 · 2019年1月28日
Unsupervised Learning via Meta-Learning
CreateAMind
26+阅读 · 2019年1月3日
meta learning 17年:MAML SNAIL
CreateAMind
8+阅读 · 2019年1月2日
disentangled-representation-papers
CreateAMind
20+阅读 · 2018年9月12日
《pyramid Attention Network for Semantic Segmentation》
统计学习与视觉计算组
39+阅读 · 2018年8月30日
【推荐】RNN最新研究进展综述
机器学习研究会
14+阅读 · 2018年1月6日
ResNet, AlexNet, VGG, Inception:各种卷积网络架构的理解
全球人工智能
14+阅读 · 2017年12月17日
计算机视觉近一年进展综述
机器学习研究会
6+阅读 · 2017年11月25日
Top