人类反馈引导的教学视觉编辑（HIVE） (HIVE: Harnessing Human Feedback for Instructional Visual Editing) - 专知论文

会员服务 ·

0

Hive · state-of-the-art · MoDELS · 数据集 · Learning ·

2023 年 3 月 16 日

HIVE: Harnessing Human Feedback for Instructional Visual Editing

翻译：人类反馈引导的教学视觉编辑（HIVE）

Shu Zhang,Xinyi Yang,Yihao Feng,Can Qin,Chia-Chih Chen,Ning Yu,Zeyuan Chen,Huan Wang,Silvio Savarese,Stefano Ermon,Caiming Xiong,Ran Xu

Incorporating human feedback has been shown to be crucial to align text generated by large language models to human preferences. We hypothesize that state-of-the-art instructional image editing models, where outputs are generated based on an input image and an editing instruction, could similarly benefit from human feedback, as their outputs may not adhere to the correct instructions and preferences of users. In this paper, we present a novel framework to harness human feedback for instructional visual editing (HIVE). Specifically, we collect human feedback on the edited images and learn a reward function to capture the underlying user preferences. We then introduce scalable diffusion model fine-tuning methods that can incorporate human preferences based on the estimated reward. Besides, to mitigate the bias brought by the limitation of data, we contribute a new 1M training dataset, a 3.6K reward dataset for rewards learning, and a 1K evaluation dataset to boost the performance of instructional image editing. We conduct extensive empirical experiments quantitatively and qualitatively, showing that HIVE is favored over previous state-of-the-art instructional image editing approaches by a large margin.

翻译：将人类反馈吸纳进来被证明对使文本生成的大型语言模型与人类偏好保持一致至关重要。我们假设，最先进的教学图像编辑模型，其输出是基于输入图像和编辑指令生成的，同样可以从人类反馈中获益，因为它们的输出可能没有遵循用户的正确指令和偏好。在本文中，我们提出了一种利用人类反馈进行教学视觉编辑的新框架（HIVE）。具体而言，我们在已编辑过的图像上收集人类反馈，并学习一个奖励函数，以捕捉潜在的用户偏好。我们引入了可扩展的扩散模型微调方法，可以基于估计的奖励来合并人类偏好。此外，为了缓解数据限制带来的偏差，我们贡献了一个新的100万训练数据集，一个3.6K奖励数据集用于奖励学习，以及一个1K评估数据集，以提高教学图像编辑的性能。我们进行了广泛的实证实验，定量和定性地展示HIVE大幅优于以前的最先进的教学图像编辑方法。

0

相关内容

Hive

Hive是基于Hadoop的一个数据仓库工具，可以将结构化的数据文件映射为一张数据库表，并提供完整的sql查询功能，可以将sql语句转换为MapReduce任务进行运行。

【AAAI2023】不确定性感知的图像描述生成

【AAAI2023】不确定性感知的图像描述生成

专知会员服务

24+阅读 · 2022年12月4日

【2022新书】深度学习R语言实战，第二版，568页pdf

【2022新书】深度学习R语言实战，第二版，568页pdf

专知会员服务

81+阅读 · 2022年10月23日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

26+阅读 · 2022年3月3日

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

专知会员服务

25+阅读 · 2020年5月6日

【CVPR2020-英伟达】从图像集合中学习自监督视点，Self-Supervised Viewpoint Learning From Image Collections

【CVPR2020-英伟达】从图像集合中学习自监督视点，Self-Supervised Viewpoint Learning From Image Collections

专知会员服务

23+阅读 · 2020年4月4日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

32+阅读 · 2020年2月29日

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

专知会员服务

33+阅读 · 2020年2月27日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

144+阅读 · 2019年10月12日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

文本生成图像？Google 推出 Imagen 新系统

文本生成图像？Google 推出 Imagen 新系统

CSDN

0+阅读 · 2022年9月2日

打开模型Zero-Shot新范式：Instruction Tuning

打开模型Zero-Shot新范式：Instruction Tuning

PaperWeekly

2+阅读 · 2022年8月25日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

专知

25+阅读 · 2018年5月28日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

【Ian Goodfellow盛赞】一个GAN生成ImageNet全部1000类物体

【Ian Goodfellow盛赞】一个GAN生成ImageNet全部1000类物体

GAN生成式对抗网络

11+阅读 · 2017年11月22日

TIM-1-Fc介导辅助T淋巴细胞反应调控异位小肠移植免疫应答机制的研究

国家自然科学基金

0+阅读 · 2016年12月31日

吡咯喹啉醌三锂调控小胶质细胞极化治疗阿尔茨海默病的机制

国家自然科学基金

0+阅读 · 2015年12月31日

a-突触核蛋白磷酸化相关激酶polo-like kinases在帕金森病发病机制中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

面向协作生成服务的社交搜索研究

国家自然科学基金

0+阅读 · 2013年12月31日

microRNA及靶向HLA-E的siRNA介导孕激素对母胎界面的免疫调控及在URSA发生机制中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA-101靶向调控EZH2在肝癌化疗耐药中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA调控骨髓间充质干细胞在肺损伤中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

骨桥蛋白(OPN)对TLR3/4和RIG-I介导的IFN-β表达的调控作用及机制

国家自然科学基金

0+阅读 · 2011年12月31日

数据和模型混合驱动的虚拟人动作姿态快速生成与交互控制技术研究

国家自然科学基金

1+阅读 · 2010年12月31日

microRNA介导ADAR1抑制流感病毒复制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

ShapeCoder: Discovering Abstractions for Visual Programs from Unstructured Primitives

Arxiv

0+阅读 · 2023年5月9日

Towards Building the Federated GPT: Federated Instruction Tuning

Arxiv

0+阅读 · 2023年5月9日

Toward Connecting Speech Acts and Search Actions in Conversational Search Tasks

Arxiv

0+阅读 · 2023年5月8日

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

Arxiv

1+阅读 · 2023年5月8日

Prompted LLMs as Chatbot Modules for Long Open-domain Conversation

Arxiv

0+阅读 · 2023年5月8日

Retriever and Ranker Framework with Probabilistic Hard Negative Sampling for Code Search

Arxiv

0+阅读 · 2023年5月8日

Self-Edit: Fault-Aware Code Editor for Code Generation

Arxiv

0+阅读 · 2023年5月6日

Large Language Models for Code: Security Hardening and Adversarial Testing

Arxiv

0+阅读 · 2023年5月5日

Otter: A Multi-Modal Model with In-Context Instruction Tuning

Arxiv

0+阅读 · 2023年5月5日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【AAAI2023】不确定性感知的图像描述生成

【AAAI2023】不确定性感知的图像描述生成

专知会员服务

24+阅读 · 2022年12月4日

【2022新书】深度学习R语言实战，第二版，568页pdf

【2022新书】深度学习R语言实战，第二版，568页pdf

专知会员服务

81+阅读 · 2022年10月23日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

26+阅读 · 2022年3月3日

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

专知会员服务

25+阅读 · 2020年5月6日

【CVPR2020-英伟达】从图像集合中学习自监督视点，Self-Supervised Viewpoint Learning From Image Collections

【CVPR2020-英伟达】从图像集合中学习自监督视点，Self-Supervised Viewpoint Learning From Image Collections

专知会员服务

23+阅读 · 2020年4月4日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

32+阅读 · 2020年2月29日

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

专知会员服务

33+阅读 · 2020年2月27日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

144+阅读 · 2019年10月12日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

热门VIP内容

相关资讯

文本生成图像？Google 推出 Imagen 新系统

文本生成图像？Google 推出 Imagen 新系统

CSDN

0+阅读 · 2022年9月2日

打开模型Zero-Shot新范式：Instruction Tuning

打开模型Zero-Shot新范式：Instruction Tuning

PaperWeekly

2+阅读 · 2022年8月25日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

专知

25+阅读 · 2018年5月28日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

【Ian Goodfellow盛赞】一个GAN生成ImageNet全部1000类物体

【Ian Goodfellow盛赞】一个GAN生成ImageNet全部1000类物体

GAN生成式对抗网络

11+阅读 · 2017年11月22日

相关论文

ShapeCoder: Discovering Abstractions for Visual Programs from Unstructured Primitives

Arxiv

0+阅读 · 2023年5月9日

Towards Building the Federated GPT: Federated Instruction Tuning

Arxiv

0+阅读 · 2023年5月9日

Toward Connecting Speech Acts and Search Actions in Conversational Search Tasks

Arxiv

0+阅读 · 2023年5月8日

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

Arxiv

1+阅读 · 2023年5月8日

Prompted LLMs as Chatbot Modules for Long Open-domain Conversation

Arxiv

0+阅读 · 2023年5月8日

Retriever and Ranker Framework with Probabilistic Hard Negative Sampling for Code Search

Arxiv

0+阅读 · 2023年5月8日

Self-Edit: Fault-Aware Code Editor for Code Generation

Arxiv

0+阅读 · 2023年5月6日

Large Language Models for Code: Security Hardening and Adversarial Testing

Arxiv

0+阅读 · 2023年5月5日

Otter: A Multi-Modal Model with In-Context Instruction Tuning

Arxiv

0+阅读 · 2023年5月5日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

相关基金

TIM-1-Fc介导辅助T淋巴细胞反应调控异位小肠移植免疫应答机制的研究

国家自然科学基金

0+阅读 · 2016年12月31日

吡咯喹啉醌三锂调控小胶质细胞极化治疗阿尔茨海默病的机制

国家自然科学基金

0+阅读 · 2015年12月31日

a-突触核蛋白磷酸化相关激酶polo-like kinases在帕金森病发病机制中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

面向协作生成服务的社交搜索研究

国家自然科学基金

0+阅读 · 2013年12月31日

microRNA及靶向HLA-E的siRNA介导孕激素对母胎界面的免疫调控及在URSA发生机制中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA-101靶向调控EZH2在肝癌化疗耐药中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA调控骨髓间充质干细胞在肺损伤中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

骨桥蛋白(OPN)对TLR3/4和RIG-I介导的IFN-β表达的调控作用及机制

国家自然科学基金

0+阅读 · 2011年12月31日

数据和模型混合驱动的虚拟人动作姿态快速生成与交互控制技术研究

国家自然科学基金

1+阅读 · 2010年12月31日

microRNA介导ADAR1抑制流感病毒复制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员