CLIPScore:无参考评价指标 (CLIPScore: A Reference-free Evaluation Metric for Image Captioning) - 专知论文

会员服务 ·

0

图像字幕 · 相关系数 · INFORMS · Performer · contrastive ·

2021 年 4 月 18 日

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

翻译：CLIPScore:无参考评价指标

Jack Hessel,Ari Holtzman,Maxwell Forbes,Ronan Le Bras,Yejin Choi

Image captioning has conventionally relied on reference-based automatic evaluations, where machine captions are compared against captions written by humans. This is in stark contrast to the reference-free manner in which humans assess caption quality. In this paper, we report the surprising empirical finding that CLIP (Radford et al., 2021), a cross-modal model pretrained on 400M image+caption pairs from the web, can be used for robust automatic evaluation of image captioning without the need for references. Experiments spanning several corpora demonstrate that our new reference-free metric, CLIPScore, achieves the highest correlation with human judgements, outperforming existing reference-based metrics like CIDEr and SPICE. Information gain experiments demonstrate that CLIPScore, with its tight focus on image-text compatibility, is complementary to existing reference-based metrics that emphasize text-text similarities. Thus, we also present a reference-augmented version, RefCLIPScore, which achieves even higher correlation. Beyond literal description tasks, several case studies reveal domains where CLIPScore performs well (clip-art images, alt-text rating), but also where it is relatively weaker vs reference-based metrics, e.g., news captions that require richer contextual knowledge.

翻译：图像字幕通常依赖于基于参考的自动评价, 机器字幕与人类撰写的字幕相比较。这与人类评估字幕质量的无参考性方式形成鲜明对比。在本文中, 我们报告令人惊讶的经验发现, CLIP( Radford 等人, 2021年), 一种在网上400M 图像加插配对上预先训练的交叉模式, 可以用来对图像字幕进行强有力的自动评价, 而不需要参考。跨多个公司实验显示, 我们新的无参考性指标( CLIPScore) 实现了与人类判断的最高相关性, 超过了现有的基于参考性指标( 如 CIDER 和 SPICE ) 。信息获取实验表明, CLIPSC( ) 以图像- 文本兼容性为紧凑合一的基于参考性指标, 是对强调文本相似性的现有参考性指标( RefCLIPSc) 的补充。因此, 我们还提出了一个参考性版本, RefCLIPScore, 实现更高的相关性。除了简单的描述任务外, 一些案例研究还揭示了 CLIPSC 的域, 其中要求相对的图表。

1

相关内容

图像字幕

图像字幕（Image Captioning）,是指从图像生成文本描述的过程，主要根据图像中物体和物体的动作。

【图与几何深度学习】Graph and geometric deep learning，49页ppt

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

60+阅读 · 2021年4月24日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

64+阅读 · 2020年5月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

91+阅读 · 2020年3月12日

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

189+阅读 · 2019年12月19日

【Google大脑Sara Sabour】胶囊架构（Capsule Architectures），附47页ppt

【Google大脑Sara Sabour】胶囊架构（Capsule Architectures），附47页ppt

专知会员服务

36+阅读 · 2019年11月24日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

12+阅读 · 2019年10月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

57+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

98+阅读 · 2019年10月9日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

14+阅读 · 2018年5月29日

自适应注意力机制在Image Caption中的应用

自适应注意力机制在Image Caption中的应用

PaperWeekly

10+阅读 · 2018年5月10日

NIPS 2017论文解读 | 基于对比学习的Image Captioning

NIPS 2017论文解读 | 基于对比学习的Image Captioning

PaperWeekly

6+阅读 · 2018年2月28日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

10+阅读 · 2017年11月12日

大神Geoffrey Hinton那篇备受关注的Capsule论文终于公开了

大神Geoffrey Hinton那篇备受关注的Capsule论文终于公开了

数据玩家

13+阅读 · 2017年10月28日

Intent Classification and Slot Filling for Privacy Policies

Arxiv

0+阅读 · 2021年6月4日

Describing like humans: on diversity in image captioning

Arxiv

3+阅读 · 2019年3月28日

Coarse-to-fine Seam Estimation for Image Stitching

Arxiv

4+阅读 · 2018年5月24日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

Diverse Few-Shot Text Classification with Multiple Metrics

Arxiv

6+阅读 · 2018年5月19日

Fine-grained Video Classification and Captioning

Arxiv

7+阅读 · 2018年4月24日

Entity-aware Image Caption Generation

Arxiv

7+阅读 · 2018年4月21日

Unpaired Image Captioning by Language Pivoting

Arxiv

4+阅读 · 2018年3月14日

A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets

Arxiv

5+阅读 · 2018年2月14日

Predicting Visual Features from Text for Image and Video Caption Retrieval

Arxiv

5+阅读 · 2018年1月29日

VIP会员

文章信息

相关主题

相关VIP内容

【图与几何深度学习】Graph and geometric deep learning，49页ppt

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

60+阅读 · 2021年4月24日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

64+阅读 · 2020年5月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

91+阅读 · 2020年3月12日

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

189+阅读 · 2019年12月19日

【Google大脑Sara Sabour】胶囊架构（Capsule Architectures），附47页ppt

【Google大脑Sara Sabour】胶囊架构（Capsule Architectures），附47页ppt

专知会员服务

36+阅读 · 2019年11月24日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

12+阅读 · 2019年10月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

57+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

98+阅读 · 2019年10月9日

热门VIP内容

相关资讯

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

14+阅读 · 2018年5月29日

自适应注意力机制在Image Caption中的应用

自适应注意力机制在Image Caption中的应用

PaperWeekly

10+阅读 · 2018年5月10日

NIPS 2017论文解读 | 基于对比学习的Image Captioning

NIPS 2017论文解读 | 基于对比学习的Image Captioning

PaperWeekly

6+阅读 · 2018年2月28日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

10+阅读 · 2017年11月12日

大神Geoffrey Hinton那篇备受关注的Capsule论文终于公开了

大神Geoffrey Hinton那篇备受关注的Capsule论文终于公开了

数据玩家

13+阅读 · 2017年10月28日

相关论文

Intent Classification and Slot Filling for Privacy Policies

Arxiv

0+阅读 · 2021年6月4日

Describing like humans: on diversity in image captioning

Arxiv

3+阅读 · 2019年3月28日

Coarse-to-fine Seam Estimation for Image Stitching

Arxiv

4+阅读 · 2018年5月24日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

Diverse Few-Shot Text Classification with Multiple Metrics

Arxiv

6+阅读 · 2018年5月19日

Fine-grained Video Classification and Captioning

Arxiv

7+阅读 · 2018年4月24日

Entity-aware Image Caption Generation

Arxiv

7+阅读 · 2018年4月21日

Unpaired Image Captioning by Language Pivoting

Arxiv

4+阅读 · 2018年3月14日

A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets

Arxiv

5+阅读 · 2018年2月14日

Predicting Visual Features from Text for Image and Video Caption Retrieval

Arxiv

5+阅读 · 2018年1月29日

微信扫码咨询专知VIP会员