室内场景分类的基于深学习的基于全球和分区的基于深度学习的全球和基于分部分的语义特征融合方法 (A Deep Learning-based Global and Segmentation-based Semantic Feature Fusion Approach for Indoor Scene Classification)

Indoor scene classification has become an important task in perception modules and has been widely used in various applications. However, problems such as intra-category variability and inter-category similarity have been holding back the models' performance, which leads to the need for new types of features to obtain a more meaningful scene representation. A semantic segmentation mask provides pixel-level information about the objects available in the scene, which makes it a promising source of information to obtain a more meaningful local representation of the scene. Therefore, in this work, a novel approach that uses a semantic segmentation mask to obtain a 2D spatial layout of the object categories across the scene, designated by segmentation-based semantic features (SSFs), is proposed. These features represent, per object category, the pixel count, as well as the 2D average position and respective standard deviation values. Moreover, a two-branch network, GS2F2App, that exploits CNN-based global features extracted from RGB images and the segmentation-based features extracted from the proposed SSFs, is also proposed. GS2F2App was evaluated in two indoor scene benchmark datasets: the SUN RGB-D and the NYU Depth V2, achieving state-of-the-art results on both datasets.

翻译：室内场景分类已成为感知模块中的一项重要任务,并被广泛用于各种应用,然而,诸如类内变异和类别间相似性等问题一直阻碍模型的性能,导致需要新型特征以获得更有意义的场景代表;语义分解面罩提供关于现场可用物体的像素级信息,使它成为获得更有意义的当地场景代表的很有希望的信息来源;因此,在这项工作中,采用一种新颖的方法,使用语义分解面罩来获得以分解为基础的全场物体类别的2D空间布局,由基于分解的语义特征(SSFs)指定,从而导致需要新型特征以获得更有意义的场景代表。这些特征代表了按对象类别分列的像素计以及2D平均位置和相应的标准偏差值。此外,还提议采用一个两层网络,即GS2F2App,利用从RGB图像中提取的CNN全球特征和从拟议的SFSFSF中提取的分解特征,以2DSF2号为对象。G2FApp在两个内部空间基准数据集上评价了SGB-UN-RGB-UN-RGB-D数据。