Regardless of the usage of deep learning and handcrafted methods, the dynamic information from videos and the effect of cross-ethnicity are rarely considered in face anti-spoofing. In this work, we propose a static-dynamic fusion mechanism for multi-modal face anti-spoofing. Inspired by motion divergences between real and fake faces, we incorporate the dynamic image calculated by rank pooling with static information into a conventional neural network (CNN) for each modality (i.e., RGB, Depth and infrared (IR)). Then, we develop a partially shared fusion method to learn complementary information from multiple modalities. Furthermore, in order to study the generalization capability of the proposal in terms of cross-ethnicity attacks and unknown spoofs, we introduce the largest public cross-ethnicity Face Anti-spoofing (CASIA-CeFA) dataset, covering 3 ethnicities, 3 modalities, 1607 subjects, and 2D plus 3D attack types. Experiments demonstrate that the proposed method achieves state-of-the-art results on CASIA-CeFA, CASIA-SURF, OULU-NPU and SiW.
翻译:不论采用深层次的学习和手工艺方法,录像的动态信息以及跨种族的影响在面对反排泄时很少被考虑。在这项工作中,我们提议为多式面部反排挤建立一个静态动态融合机制。在真实面部和假面部之间的运动差异的启发下,我们将用静态信息进行排位组合的动态图像纳入每种模式(即RGB、深度和红外线(IR))的常规神经网络(CNN)中。然后,我们开发了一个部分共享的融合方法,以便从多种模式中学习补充信息。此外,为了研究提案在跨种族攻击和未知面部方面的总体能力,我们引入了最大的公共跨种族反排泄(CASIA-CEFA-SURF、SOULUUNP和UU。)数据集,涵盖3个民族、3种模式、1607个主题和2D+3D攻击类型。实验表明,拟议的方法在CSIA-CEFA、CSIA-SURNP和SWU-U上取得了最新成果。