高度个性化的文本嵌入用于稳定扩散的图像处理 (Highly Personalized Text Embedding for Image Manipulation by Stable Diffusion) - 专知论文

会员服务 ·

0

文本嵌入 · 嵌入 · 稳定扩散 · 嵌入空间 · 图像处理 ·

2023 年 4 月 5 日

Highly Personalized Text Embedding for Image Manipulation by Stable Diffusion

翻译：高度个性化的文本嵌入用于稳定扩散的图像处理

Inhwa Han,Serin Yang,Taesung Kwon,Jong Chul Ye

Diffusion models have shown superior performance in image generation and manipulation, but the inherent stochasticity presents challenges in preserving and manipulating image content and identity. While previous approaches like DreamBooth and Textual Inversion have proposed model or latent representation personalization to maintain the content, their reliance on multiple reference images and complex training limits their practicality. In this paper, we present a simple yet highly effective approach to personalization using highly personalized (HiPer) text embedding by decomposing the CLIP embedding space for personalization and content manipulation. Our method does not require model fine-tuning or identifiers, yet still enables manipulation of background, texture, and motion with just a single image and target text. Through experiments on diverse target texts, we demonstrate that our approach produces highly personalized and complex semantic image edits across a wide range of tasks. We believe that the novel understanding of the text embedding space presented in this work has the potential to inspire further research across various tasks.

翻译：扩散模型已经展示出在图像生成和处理方面的出色性能，但固有的随机性在保留和操作图像内容和身份方面存在挑战。虽然以前的方法如DreamBooth和Textual Inversion提出了模型或潜在表示的个性化以维护内容，但它们对多个参考图像和复杂训练的依赖限制了它们的实用性。在本文中，我们提出了一种简单但高效的个性化方法，使用高度个性化（HiPer）文本嵌入，通过分解CLIP嵌入空间进行个性化和内容操作。我们的方法不需要模型微调或标识符，但仍可以仅使用单个图像和目标文本实现背景、纹理和动作的操作。通过对不同目标文本的实验，我们展示了我们的方法在各种任务中产生高度个性化和复杂的语义图像编辑。我们相信，本文所提出的文本嵌入空间的新理解，有潜力启发各种任务的进一步研究。

0

相关内容

文本嵌入

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

59+阅读 · 2022年4月22日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

22+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

44+阅读 · 2021年11月24日

预训练模型如何用于文本挖掘？看这份KDD2021-UIUC《预训练文本表示:模型与应用在文本挖掘》教程，附200页Slides

专知会员服务

42+阅读 · 2021年8月18日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

33+阅读 · 2020年6月19日

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

专知会员服务

29+阅读 · 2020年5月20日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

96+阅读 · 2020年4月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

51+阅读 · 2020年3月3日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

文件更小，质量更高，大火的Stable Diffusion还能压缩图像？

文件更小，质量更高，大火的Stable Diffusion还能压缩图像？

机器之心

4+阅读 · 2022年10月20日

EasyNLP中文文图生成模型带你秒变艺术家

EasyNLP中文文图生成模型带你秒变艺术家

阿里技术

1+阅读 · 2022年7月28日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

11+阅读 · 2018年6月24日

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

专知

14+阅读 · 2018年3月30日

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

专知

15+阅读 · 2018年2月13日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

基于几何形状的彩色纹理分析方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

多元多尺度仿生自修复刀具及其增效机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

逼真稳定的服装动画方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

维持压缩率的JPEG图像选择性加密方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

大肠癌中DNA复制蛋白对双微体染色质的复制、损伤和修复的影响及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

模拟人类视觉系统的基于图像的快速三维建模方法

国家自然科学基金

0+阅读 · 2011年12月31日

高精细模型的向量位移映射表示及几何处理

国家自然科学基金

0+阅读 · 2011年12月31日

重调和方程基于Poisson算子的高效有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

基于语义的图像合成

国家自然科学基金

0+阅读 · 2011年12月31日

结构信息最优的分布式视频压缩算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

A Neural Space-Time Representation for Text-to-Image Personalization

Arxiv

0+阅读 · 2023年5月24日

A Deep Generative Model for Interactive Data Annotation through Direct Manipulation in Latent Space

Arxiv

0+阅读 · 2023年5月24日

DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2023年5月24日

A Novel Sampling Scheme for Text- and Image-Conditional Image Synthesis in Quantized Latent Spaces

Arxiv

0+阅读 · 2023年5月23日

If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection

Arxiv

0+阅读 · 2023年5月22日

Hierarchical Integration Diffusion Model for Realistic Image Deblurring

Arxiv

0+阅读 · 2023年5月22日

Text-based Person Search without Parallel Image-Text Data

Arxiv

0+阅读 · 2023年5月22日

MaGIC: Multi-modality Guided Image Completion

Arxiv

0+阅读 · 2023年5月19日

Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots

Arxiv

1+阅读 · 2023年5月19日

Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly

Arxiv

18+阅读 · 2018年1月15日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

59+阅读 · 2022年4月22日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

22+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

44+阅读 · 2021年11月24日

预训练模型如何用于文本挖掘？看这份KDD2021-UIUC《预训练文本表示:模型与应用在文本挖掘》教程，附200页Slides

专知会员服务

42+阅读 · 2021年8月18日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

33+阅读 · 2020年6月19日

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

专知会员服务

29+阅读 · 2020年5月20日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

96+阅读 · 2020年4月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

51+阅读 · 2020年3月3日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

热门VIP内容

相关资讯

文件更小，质量更高，大火的Stable Diffusion还能压缩图像？

文件更小，质量更高，大火的Stable Diffusion还能压缩图像？

机器之心

4+阅读 · 2022年10月20日

EasyNLP中文文图生成模型带你秒变艺术家

EasyNLP中文文图生成模型带你秒变艺术家

阿里技术

1+阅读 · 2022年7月28日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

11+阅读 · 2018年6月24日

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

专知

14+阅读 · 2018年3月30日

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

专知

15+阅读 · 2018年2月13日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

A Neural Space-Time Representation for Text-to-Image Personalization

Arxiv

0+阅读 · 2023年5月24日

A Deep Generative Model for Interactive Data Annotation through Direct Manipulation in Latent Space

Arxiv

0+阅读 · 2023年5月24日

DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2023年5月24日

A Novel Sampling Scheme for Text- and Image-Conditional Image Synthesis in Quantized Latent Spaces

Arxiv

0+阅读 · 2023年5月23日

If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection

Arxiv

0+阅读 · 2023年5月22日

Hierarchical Integration Diffusion Model for Realistic Image Deblurring

Arxiv

0+阅读 · 2023年5月22日

Text-based Person Search without Parallel Image-Text Data

Arxiv

0+阅读 · 2023年5月22日

MaGIC: Multi-modality Guided Image Completion

Arxiv

0+阅读 · 2023年5月19日

Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots

Arxiv

1+阅读 · 2023年5月19日

Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly

Arxiv

18+阅读 · 2018年1月15日

相关基金

基于几何形状的彩色纹理分析方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

多元多尺度仿生自修复刀具及其增效机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

逼真稳定的服装动画方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

维持压缩率的JPEG图像选择性加密方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

大肠癌中DNA复制蛋白对双微体染色质的复制、损伤和修复的影响及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

模拟人类视觉系统的基于图像的快速三维建模方法

国家自然科学基金

0+阅读 · 2011年12月31日

高精细模型的向量位移映射表示及几何处理

国家自然科学基金

0+阅读 · 2011年12月31日

重调和方程基于Poisson算子的高效有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

基于语义的图像合成

国家自然科学基金

0+阅读 · 2011年12月31日

结构信息最优的分布式视频压缩算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员