Omni-Attribute：用于视觉概念个性化的开放词汇属性编码器 (Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization)

Tsai-Shien Chen,Aliaksandr Siarohin,Guocheng Gordon Qian,Kuan-Chieh Jackson Wang,Egor Nemchinov,Moayed Haji-Ali,Riza Alp Guler,Willi Menapace,Ivan Skorokhodov,Anil Kag,Jun-Yan Zhu,Sergey Tulyakov

from arxiv, Project page: https://snap-research.github.io/omni-attribute

Visual concept personalization aims to transfer only specific image attributes, such as identity, expression, lighting, and style, into unseen contexts. However, existing methods rely on holistic embeddings from general-purpose image encoders, which entangle multiple visual factors and make it difficult to isolate a single attribute. This often leads to information leakage and incoherent synthesis. To address this limitation, we introduce Omni-Attribute, the first open-vocabulary image attribute encoder designed to learn high-fidelity, attribute-specific representations. Our approach jointly designs the data and model: (i) we curate semantically linked image pairs annotated with positive and negative attributes to explicitly teach the encoder what to preserve or suppress; and (ii) we adopt a dual-objective training paradigm that balances generative fidelity with contrastive disentanglement. The resulting embeddings prove effective for open-vocabulary attribute retrieval, personalization, and compositional generation, achieving state-of-the-art performance across multiple benchmarks.

翻译：视觉概念个性化旨在仅将特定图像属性（如身份、表情、光照和风格）迁移到未见过的上下文中。然而，现有方法依赖于通用图像编码器提取的整体嵌入，这些嵌入往往纠缠了多种视觉因素，使得难以分离单一属性，常导致信息泄漏与合成结果不连贯。为克服这一局限，我们提出了Omni-Attribute——首个开放词汇图像属性编码器，旨在学习高保真、属性特定的表示。我们的方法协同设计了数据与模型：（i）我们构建了带有正负属性标注的语义关联图像对，以显式指导编码器保留或抑制哪些特征；（ii）采用双目标训练范式，在生成保真度与对比解耦之间取得平衡。所得嵌入在开放词汇属性检索、个性化及组合生成任务中均表现优异，在多个基准测试中达到了最先进的性能。

相关内容

属性

关注 1

一个具体事物，总是有许许多多的性质与关系，我们把一个事物的性质与关系，都叫作事物的属性。事物与属性是不可分的，事物都是有属性的事物，属性也都是事物的属性。一个事物与另一个事物的相同或相异，也就是一个事物的属性与另一事物的属性的相同或相异。由于事物属性的相同或相异，客观世界中就形成了许多不同的事物类。具有相同属性的事物就形成一类，具有不同属性的事物就分别地形成不同的类。

【AAAI2025】ViPCap: 基于检索的文本视觉提示用于轻量级图像描述

专知会员服务

12+阅读 · 1月2日

【NeurIPS2024】MoTE：在视觉语言到视频知识转移中协调泛化与专门化

专知会员服务

13+阅读 · 2024年10月16日

【CVPR2024】VideoMAC: 视频掩码自编码器与卷积神经网络

专知会员服务

17+阅读 · 2024年3月4日

【CVPR 2021】变换器跟踪TransT: Transformer Tracking

专知会员服务

22+阅读 · 2021年4月20日