差异视觉辨识分解 (Disentanglement for Discriminative Visual Recognition)

Recent successes of deep learning-based recognition rely on maintaining the content related to the main-task label. However, how to explicitly dispel the noisy signals for better generalization in a controllable manner remains an open issue. For instance, various factors such as identity-specific attributes, pose, illumination and expression affect the appearance of face images. Disentangling the identity-specific factors is potentially beneficial for facial expression recognition (FER). This chapter systematically summarize the detrimental factors as task-relevant/irrelevant semantic variations and unspecified latent variation. In this chapter, these problems are casted as either a deep metric learning problem or an adversarial minimax game in the latent space. For the former choice, a generalized adaptive (N+M)-tuplet clusters loss function together with the identity-aware hard-negative mining and online positive mining scheme can be used for identity-invariant FER. The better FER performance can be achieved by combining the deep metric loss and softmax loss in a unified two fully connected layer branches framework via joint optimization. For the latter solution, it is possible to equipping an end-to-end conditional adversarial network with the ability to decompose an input sample into three complementary parts. The discriminative representation inherits the desired invariance property guided by prior knowledge of the task, which is marginal independent to the task-relevant/irrelevant semantic and latent variations. The framework achieves top performance on a serial of tasks, including lighting, makeup, disguise-tolerant face recognition and facial attributes recognition. This chapter systematically summarize the popular and practical solution for disentanglement to achieve more discriminative visual recognition.

翻译：最近深层次学习承认的成功取决于保持与主任务标签有关的内容。然而,如何明确消除噪音信号,以可控制的方式更好地概括化仍然是一个未决问题。例如,身份特定属性、形成、光化和表达等各种因素影响脸部图像的外观。区分身份特定因素可能对面部表情识别(FER)有利。本章系统地总结各种有害因素,作为任务相关/不相关语义变化和未明确的潜在潜在变异。在本章中,这些问题要么是一个深层次的衡量学习问题,要么是潜在空间的对抗性微缩游戏。对于前一种选择而言,一个普遍适应(N+M)图例群落丢失功能,加上身份认知硬负采矿和在线正面采矿计划等,可以用来促进面部表达(FER)。通过联合优化,将深度计量损失和软缩缩缩增缩的层分支框架结合起来,可以提高FERD的性能。对于后一种解决方案而言,可以将最终的面部至端的面部和面部直径直径直方向的直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径对等。在前的图像对面的定位图图图图上,通过前的定位定位路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路