目标导向的零热学习加热估计 (Goal-Oriented Gaze Estimation for Zero-Shot Learning)

Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic knowledge from seen classes to unseen classes. Since semantic knowledge is built on attributes shared between different classes, which are highly local, strong prior for localization of object attribute is beneficial for visual-semantic embedding. Interestingly, when recognizing unseen images, human would also automatically gaze at regions with certain semantic clue. Therefore, we introduce a novel goal-oriented gaze estimation module (GEM) to improve the discriminative attribute localization based on the class-level attributes for ZSL. We aim to predict the actual human gaze location to get the visual attention regions for recognizing a novel object guided by attribute description. Specifically, the task-dependent attention is learned with the goal-oriented GEM, and the global image features are simultaneously optimized with the regression of local attribute features. Experiments on three ZSL benchmarks, i.e., CUB, SUN and AWA2, show the superiority or competitiveness of our proposed method against the state-of-the-art ZSL methods. The ablation analysis on real gaze data CUB-VWSW also validates the benefits and accuracy of our gaze estimation module. This work implies the promising benefits of collecting human gaze dataset and automatic gaze estimation algorithms on high-level computer vision tasks. The code is available at https://github.com/osierboy/GEM-ZSL.

翻译：零点学习( ZSL) 旨在通过将语义知识从可见的班级转移到不为人知的班级来识别新课程。由于语义知识是建立在不同班级之间共享的属性上, 高度本地化, 目标属性本地化前的强项有利于视觉- 语义嵌入。有趣的是, 当识别看不见的图像时, 人类也会自动地在具有某种语义线索的区域观察。因此, 我们引入了一个新颖的面向目标的视觉估计模块( GEM ), 以改善基于 ZSL 等级属性的歧视性属性的视觉评估地方化。我们的目标是预测实际的人类视觉定位位置, 以获得视觉关注区域, 以识别一个以属性描述为指南的新对象的物体。具体地说, 以目标为主的GEMEMM为主, 以及全球图像特征与本地属性特征的回归同时得到优化。三个 ZSLSL基准的实验, 即 CUB、 SUN 和 AW2,, 显示我们拟议方法优劣度或优于SLSL/ State- art ZSLSL 方法。在真实的视觉评估中真实性数据水平上的视觉评估分析中,, 也意味着我们目前水平上的数据水平的视觉分析的视觉分析模型的高级分析, 的高级分析。