通用扩散模型实现人物图像生成、编辑和姿态转移 (UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer) - 专知论文

会员服务 ·

0

人物图像 · 图像生成 · 模型实现 · 扩散模型 · 细粒度 ·

2023 年 4 月 18 日

UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer

翻译：通用扩散模型实现人物图像生成、编辑和姿态转移

Soon Yau Cheong,Armin Mustafa,Andrew Gilbert

Existing person image generative models can do either image generation or pose transfer but not both. We propose a unified diffusion model, UPGPT to provide a universal solution to perform all the person image tasks - generative, pose transfer, and editing. With fine-grained multimodality and disentanglement capabilities, our approach offers fine-grained control over the generation and the editing process of images using a combination of pose, text, and image, all without needing a semantic segmentation mask which can be challenging to obtain or edit. We also pioneer the parameterized body SMPL model in pose-guided person image generation to demonstrate new capability - simultaneous pose and camera view interpolation while maintaining a person's appearance. Results on the benchmark DeepFashion dataset show that UPGPT is the new state-of-the-art while simultaneously pioneering new capabilities of edit and pose transfer in human image generation.

翻译：现有的人物图像生成模型可以实现图像生成或姿态转移，但不可同时实现。我们提出了一种统一的扩散模型UPGPT，提供了一个通用的解决方案，可以完成所有的人物图像任务-生成、姿态转移和编辑。我们的方法具有细粒度的多模态和解耦能力，可以使用姿态、文本和图像的组合对图像的生成和编辑过程进行精细控制，而不需要语义分割掩码，从而避免难以获得或编辑的挑战。我们还首创了基于参数化身体SMPL模型的姿态引导人物图像生成，在保持人物外观的同时实现了姿态和摄像机视图的插值。在基准数据集DeepFashion上的结果表明，UPGPT是新的最先进技术，同时在人物图像生成中开创了编辑和姿态转移的新功能。

0

相关内容

人物图像

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

20+阅读 · 2022年3月18日

CVPR2022 | 多模态Transformer用于视频分割效果惊艳

CVPR2022 | 多模态Transformer用于视频分割效果惊艳

专知会员服务

40+阅读 · 2022年3月12日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

22+阅读 · 2022年3月3日

【CVPR 2021】姿态可控的语音驱动说话人脸

专知会员服务

15+阅读 · 2021年5月13日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

33+阅读 · 2020年6月19日

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

专知会员服务

51+阅读 · 2020年1月20日

【CGAN论文笔记强烈推荐】基于CGAN的人脸深度图估计： Face Depth Estimation With Conditional Generative Adversarial Networks

专知会员服务

23+阅读 · 2020年1月8日

【ICIP2019教程-NVIDIA】图像到图像转换，附7份PPT下载

【ICIP2019教程-NVIDIA】图像到图像转换，附7份PPT下载

专知会员服务

53+阅读 · 2019年11月20日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

25+阅读 · 2019年11月8日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

基于Tensorflow、Keras实现Stable Diffusion，开箱即用实现多GPU推理

基于Tensorflow、Keras实现Stable Diffusion，开箱即用实现多GPU推理

机器之心

1+阅读 · 2022年9月20日

ECCV 2022 | PanoFormer: 首个360°全景定制的单目深度估计Transformer

ECCV 2022 | PanoFormer: 首个360°全景定制的单目深度估计Transformer

PaperWeekly

0+阅读 · 2022年8月30日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【推荐】(TensorFlow)SSD实时手部检测与追踪（附代码）

【推荐】(TensorFlow)SSD实时手部检测与追踪（附代码）

机器学习研究会

11+阅读 · 2017年12月5日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

setdb1与Tiam1相互作用通过调控EMT促进肝癌侵袭转移

国家自然科学基金

0+阅读 · 2015年12月31日

Plücker直线摄影测量的理论与方法

国家自然科学基金

0+阅读 · 2014年12月31日

GPU加速和风格感知的艺术图像和谐克隆

国家自然科学基金

4+阅读 · 2014年12月31日

一类不确定非仿射非线性系统terminal滑模控制研究及在近空间飞行器中的应用

国家自然科学基金

0+阅读 · 2013年12月31日

基于数据驱动的三维服装真实感模型研究与实现

国家自然科学基金

1+阅读 · 2012年12月31日

基于多尺度边缘感知的图像平滑和分层编辑研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ｓlingshot-1L/LIM Kinase1信号网络逆转骨肉瘤转移及多药耐药的机制

国家自然科学基金

0+阅读 · 2011年12月31日

基于语义的图像合成

国家自然科学基金

0+阅读 · 2011年12月31日

基于视频语义理解的艺术风格化研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于边缘点的折反射图像立体匹配与三维重建研究

国家自然科学基金

0+阅读 · 2009年12月31日

ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation

Arxiv

0+阅读 · 2023年6月1日

Conditioning Diffusion Models via Attributes and Semantic Masks for Face Generation

Arxiv

0+阅读 · 2023年6月1日

FDNeRF: Semantics-Driven Face Reconstruction, Prompt Editing and Relighting with Diffusion Models

Arxiv

0+阅读 · 2023年6月1日

Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2023年5月31日

The Stable Artist: Steering Semantics in Diffusion Latent Space

Arxiv

0+阅读 · 2023年5月31日

MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL

Arxiv

0+阅读 · 2023年5月31日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Arxiv

10+阅读 · 2018年4月11日

Pose-Normalized Image Generation for Person Re-identification

Arxiv

11+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

相关VIP内容

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

20+阅读 · 2022年3月18日

CVPR2022 | 多模态Transformer用于视频分割效果惊艳

CVPR2022 | 多模态Transformer用于视频分割效果惊艳

专知会员服务

40+阅读 · 2022年3月12日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

22+阅读 · 2022年3月3日

【CVPR 2021】姿态可控的语音驱动说话人脸

专知会员服务

15+阅读 · 2021年5月13日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

33+阅读 · 2020年6月19日

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

专知会员服务

51+阅读 · 2020年1月20日

【CGAN论文笔记强烈推荐】基于CGAN的人脸深度图估计： Face Depth Estimation With Conditional Generative Adversarial Networks

专知会员服务

23+阅读 · 2020年1月8日

【ICIP2019教程-NVIDIA】图像到图像转换，附7份PPT下载

【ICIP2019教程-NVIDIA】图像到图像转换，附7份PPT下载

专知会员服务

53+阅读 · 2019年11月20日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

25+阅读 · 2019年11月8日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

热门VIP内容

相关资讯

基于Tensorflow、Keras实现Stable Diffusion，开箱即用实现多GPU推理

基于Tensorflow、Keras实现Stable Diffusion，开箱即用实现多GPU推理

机器之心

1+阅读 · 2022年9月20日

ECCV 2022 | PanoFormer: 首个360°全景定制的单目深度估计Transformer

ECCV 2022 | PanoFormer: 首个360°全景定制的单目深度估计Transformer

PaperWeekly

0+阅读 · 2022年8月30日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【推荐】(TensorFlow)SSD实时手部检测与追踪（附代码）

【推荐】(TensorFlow)SSD实时手部检测与追踪（附代码）

机器学习研究会

11+阅读 · 2017年12月5日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation

Arxiv

0+阅读 · 2023年6月1日

Conditioning Diffusion Models via Attributes and Semantic Masks for Face Generation

Arxiv

0+阅读 · 2023年6月1日

FDNeRF: Semantics-Driven Face Reconstruction, Prompt Editing and Relighting with Diffusion Models

Arxiv

0+阅读 · 2023年6月1日

Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2023年5月31日

The Stable Artist: Steering Semantics in Diffusion Latent Space

Arxiv

0+阅读 · 2023年5月31日

MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL

Arxiv

0+阅读 · 2023年5月31日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Arxiv

10+阅读 · 2018年4月11日

Pose-Normalized Image Generation for Person Re-identification

Arxiv

11+阅读 · 2018年1月18日

相关基金

setdb1与Tiam1相互作用通过调控EMT促进肝癌侵袭转移

国家自然科学基金

0+阅读 · 2015年12月31日

Plücker直线摄影测量的理论与方法

国家自然科学基金

0+阅读 · 2014年12月31日

GPU加速和风格感知的艺术图像和谐克隆

国家自然科学基金

4+阅读 · 2014年12月31日

一类不确定非仿射非线性系统terminal滑模控制研究及在近空间飞行器中的应用

国家自然科学基金

0+阅读 · 2013年12月31日

基于数据驱动的三维服装真实感模型研究与实现

国家自然科学基金

1+阅读 · 2012年12月31日

基于多尺度边缘感知的图像平滑和分层编辑研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ｓlingshot-1L/LIM Kinase1信号网络逆转骨肉瘤转移及多药耐药的机制

国家自然科学基金

0+阅读 · 2011年12月31日

基于语义的图像合成

国家自然科学基金

0+阅读 · 2011年12月31日

基于视频语义理解的艺术风格化研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于边缘点的折反射图像立体匹配与三维重建研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员