图像字幕(Image Captioning),是指从图像生成文本描述的过程,主要根据图像中物体和物体的动作。

VIP内容

主题: Vision and Language: the text modality in computer vision

简介: 长期以来,文档图像分析一直致力于创建智能阅读系统,其重点仅在于理解以图像形式呈现的文本和图形信息。 另一方面,总体而言,计算机视觉显示出以各种方式利用多模式信息的日益增长的趋势。从一种模态转换为另一种模态或派生出模态之间的联合嵌入是两个关键范式。文本通常是感兴趣的形式之一,尽管很少是指图像形式的文本。 在本教程中,我们将从文档分析和计算机视觉的最新进展中汲取经验,以展示当前如何在最先进的研究中处理作为形式的文本。我们将回顾各种方法和应用,重点关注用于多模式嵌入和跨模式翻译的深度学习技术,这些技术为建模文本和视觉信息之间的相关性提供了非常强大的框架。 本教程将介绍的一些应用程序示例包括:

  • 词点检测,目的是为字符串的视觉(图像)和文本(转录)表示之间的相关性建模。

  • 动态词典生成, 其目的是通过利用场景的视觉信息,动态地提出在图像中极有可能出现的单词字典,以此作为促进后续场景文本识别的手段。 在一种模式(文本)作为另一种模式(图像)的监督信号的情况下,对视觉特征进行自我监督学习,提供了一种学习有用特征的机制,从而避免了昂贵的注释。

  • 图像的跨模式/多模式语义检索, 其目的是对视觉信息和从文本信息中导出的语义之间的相关性进行建模,以实现跨模式图像检索。

  • 图像字幕, 目标是从视觉域转换到文本域(自然语言)。我们将在本教程中讨论的现有方法的有趣变化是,如何将图像中或要描述的图像中的文本信息整合到字幕处理过程中。

嘉宾介绍: Dimosthenis Karatzas是巴塞罗那大学的副教授,西班牙巴塞罗那的计算机视觉中心(CVC)副主任。在CVC,他领导视觉和语言研究领域,在计算机视觉和文本分析的交汇处工作。他与他人合着了100多种参考期刊和会议出版物,H指数为23。他曾获得2013年度IAPR / ICDAR青年研究奖和2017年Google院系研究奖。D. Karatzas在其领域的主要会议(ICDAR,DAS,CBDAR,ICPR,ICFHR)中担任过各种职务,包括-主持IWRR 2014/16/18和CBDAR 2015/17。D. Karatzas是“健壮的阅读比赛”系列的主要组织者。他是国际阅读系统技术委员会的主席。模式识别协会。D. Karatzas是SPIE英国分会的创始成员和执行委员会成员,而他目前是IAPR教育委员会的成员和IEEE IAPR的成员。他是图书馆生活实验室(Library Living Lab)的创始人之一,该实验室是公共图书馆中的开放式参与式创新空间。

成为VIP会员查看完整内容
0
20

最新内容

Image captioning has made substantial progress with huge supporting image collections sourced from the web. However, recent studies have pointed out that captioning datasets, such as COCO, contain gender bias found in web corpora. As a result, learning models could heavily rely on the learned priors and image context for gender identification, leading to incorrect or even offensive errors. To encourage models to learn correct gender features, we reorganize the COCO dataset and present two new splits COCO-GB V1 and V2 datasets where the train and test sets have different gender-context joint distribution. Models relying on contextual cues will suffer from huge gender prediction errors on the anti-stereotypical test data. Benchmarking experiments reveal that most captioning models learn gender bias, leading to high gender prediction errors, especially for women. To alleviate the unwanted bias, we propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence. Experimental results validate that GAIC can significantly reduce gender prediction errors with a competitive caption quality. Our codes and the designed benchmark datasets are available at https://github.com/datamllab/Mitigating_Gender_Bias_In_Captioning_System.

0
0
下载
预览

最新论文

Image captioning has made substantial progress with huge supporting image collections sourced from the web. However, recent studies have pointed out that captioning datasets, such as COCO, contain gender bias found in web corpora. As a result, learning models could heavily rely on the learned priors and image context for gender identification, leading to incorrect or even offensive errors. To encourage models to learn correct gender features, we reorganize the COCO dataset and present two new splits COCO-GB V1 and V2 datasets where the train and test sets have different gender-context joint distribution. Models relying on contextual cues will suffer from huge gender prediction errors on the anti-stereotypical test data. Benchmarking experiments reveal that most captioning models learn gender bias, leading to high gender prediction errors, especially for women. To alleviate the unwanted bias, we propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence. Experimental results validate that GAIC can significantly reduce gender prediction errors with a competitive caption quality. Our codes and the designed benchmark datasets are available at https://github.com/datamllab/Mitigating_Gender_Bias_In_Captioning_System.

0
0
下载
预览
Top