在跨模式检索中学习从数据中分离的边端因素:不可隐含识别的 VAE 方法 (Learning Disentangled Latent Factors from Paired Data in Cross-Modal Retrieval: An Implicit Identifiable VAE Approach)

We deal with the problem of learning the underlying disentangled latent factors that are shared between the paired bi-modal data in cross-modal retrieval. Our assumption is that the data in both modalities are complex, structured, and high dimensional (e.g., image and text), for which the conventional deep auto-encoding latent variable models such as the Variational Autoencoder (VAE) often suffer from difficulty of accurate decoder training or realistic synthesis. A suboptimally trained decoder can potentially harm the model's capability of identifying the true factors. In this paper we propose a novel idea of the implicit decoder, which completely removes the ambient data decoding module from a latent variable model, via implicit encoder inversion that is achieved by Jacobian regularization of the low-dimensional embedding function. Motivated from the recent Identifiable VAE (IVAE) model, we modify it to incorporate the query modality data as conditioning auxiliary input, which allows us to prove that the true parameters of the model can be identified under some regularity conditions. Tested on various datasets where the true factors are fully/partially available, our model is shown to identify the factors accurately, significantly outperforming conventional encoder-decoder latent variable models. We also test our model on the Recipe1M, the large-scale food image/recipe dataset, where the learned factors by our approach highly coincide with the most pronounced food factors that are widely agreed on, including savoriness, wateriness, and greenness.

翻译：我们的假设是,两种模式中的数据都是复杂、结构化和高维的(例如图像和文本),而传统的深自动编码潜在变量模型,如Vacarational Autoencoder(VAE)往往在准确解码器培训或现实合成方面遇到困难。一个经过再优化培训的解码器可能会损害模型在广泛识别真实因素方面的能力。在本文中,我们提出了一个隐含解码器的新概念,它完全消除了环境数据解码模块的潜伏变异模型(例如图像和文本),其隐含的编码变异模型是通过Jacobian对低维基嵌入功能进行正规化的隐含的。我们从最近的可识别解码的VAE(IVAE)模型的动力出发,对它进行修改,将查询方式数据作为调节辅助输入,从而使我们能够证明模型的真实参数可以在某些定期条件下识别。我们提出的隐含的解码解码性概念,通过各种可变异的模型测试,我们所了解的变异的模型,以及我们所了解的易变的变的变式模型。