Exploring Better Text Image Translation with Multimodal Codebook - 专知论文

会员服务 ·

0

TIT · 多峰值 · MoDELS · Better · OCR ·

2023 年 6 月 2 日

Exploring Better Text Image Translation with Multimodal Codebook

翻译：暂无翻译

Zhibin Lan,Jiawei Yu,Xiang Li,Wen Zhang,Jian Luan,Bin Wang,Degen Huang,Jinsong Su

from arxiv, Accepted by ACL 2023 Main Conference

Text image translation (TIT) aims to translate the source texts embedded in the image to target translations, which has a wide range of applications and thus has important research value. However, current studies on TIT are confronted with two main bottlenecks: 1) this task lacks a publicly available TIT dataset, 2) dominant models are constructed in a cascaded manner, which tends to suffer from the error propagation of optical character recognition (OCR). In this work, we first annotate a Chinese-English TIT dataset named OCRMT30K, providing convenience for subsequent studies. Then, we propose a TIT model with a multimodal codebook, which is able to associate the image with relevant texts, providing useful supplementary information for translation. Moreover, we present a multi-stage training framework involving text machine translation, image-text alignment, and TIT tasks, which fully exploits additional bilingual texts, OCR dataset and our OCRMT30K dataset to train our model. Extensive experiments and in-depth analyses strongly demonstrate the effectiveness of our proposed model and training framework.

翻译：暂无翻译

0

相关内容

TIT

TIT(IEEE Transactions on Information Theory)信息理论汇刊是一本发表有关信息传输、处理和利用的理论和实验论文的期刊。可接受的标的物的界限故意没有明确界定。相反，人们希望，随着研究活动的重点变化，灵活的政策将允许这类交易效仿。最近的目录最能反映当前适当的主题；它们在封面内侧的编辑区标题中进行了总结。官网链接：https://ieeexplore.ieee.org/xpl/aboutJournal.jsp?punumber=18

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

神经精神性高血压NE/机械力联合介导VSMC-α1-ARs信号加速促进移植静脉粥样硬化及机制探讨

国家自然科学基金

0+阅读 · 2015年12月31日

挑战性羰基化合物的选择性催化氢化

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

金催化的sp3碳-氢键的不对称氧化偶联

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

人造规范势中冷原子的新奇量子态及动力学研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型pincer配合物的设计合成及其结构和催化活性研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于光纤与腔QED系统耦合的量子信息处理器

国家自然科学基金

0+阅读 · 2009年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

关于Hamilton系统的边值解问题的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment

Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment

Arxiv

0+阅读 · 2023年7月24日

Multi-View Vertebra Localization and Identification from CT Images

Arxiv

0+阅读 · 2023年7月24日

Lost In Translation: Generating Adversarial Examples Robust to Round-Trip Translation

Arxiv

0+阅读 · 2023年7月24日

See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data

Arxiv

0+阅读 · 2023年7月20日

Extreme Multi-Label Skill Extraction Training using Large Language Models

Arxiv

0+阅读 · 2023年7月20日

PPN: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts

Arxiv

0+阅读 · 2023年7月20日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《军队分析危机：不当行为数据的现代化革新》最新报告

《美陆军条令：防空反导作战》2025最新218页

现代战争中的数据主导权：人工智能与数据分析的关键作用

【博士论文】神经网络中的元学习与组合泛化

相关资讯

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

相关论文

Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment

Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment

Arxiv

0+阅读 · 2023年7月24日

Multi-View Vertebra Localization and Identification from CT Images

Arxiv

0+阅读 · 2023年7月24日

Lost In Translation: Generating Adversarial Examples Robust to Round-Trip Translation

Arxiv

0+阅读 · 2023年7月24日

See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data

Arxiv

0+阅读 · 2023年7月20日

Extreme Multi-Label Skill Extraction Training using Large Language Models

Arxiv

0+阅读 · 2023年7月20日

PPN: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts

Arxiv

0+阅读 · 2023年7月20日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

相关基金

神经精神性高血压NE/机械力联合介导VSMC-α1-ARs信号加速促进移植静脉粥样硬化及机制探讨

国家自然科学基金

0+阅读 · 2015年12月31日

挑战性羰基化合物的选择性催化氢化

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

金催化的sp3碳-氢键的不对称氧化偶联

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

人造规范势中冷原子的新奇量子态及动力学研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型pincer配合物的设计合成及其结构和催化活性研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于光纤与腔QED系统耦合的量子信息处理器

国家自然科学基金

0+阅读 · 2009年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

关于Hamilton系统的边值解问题的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员