2020年时代-IQ 挑战第二地点小组的解决方案 (Fashion-IQ 2020 Challenge 2nd Place Team's Solution) - 专知论文

会员服务 ·

0

TEAM · MoDELS · Performer · 多峰值 · GRU ·

2020 年 7 月 13 日

Fashion-IQ 2020 Challenge 2nd Place Team's Solution

翻译：2020年时代-IQ 挑战第二地点小组的解决方案

Minchul Shin,Yoonjae Cho,Seongwuk Hong

from arxiv, 4 pages, CVPR 2020 Workshop, Fashion IQ Challenge

This paper is dedicated to team VAA's approach submitted to the Fashion-IQ challenge in CVPR 2020. Given a pair of the image and the text, we present a novel multimodal composition method, RTIC, that can effectively combine the text and the image modalities into a semantic space. We extract the image and the text features that are encoded by the CNNs and the sequential models (e.g., LSTM or GRU), respectively. To emphasize the meaning of the residual of the feature between the target and candidate, the RTIC is composed of N-blocks with channel-wise attention modules. Then, we add the encoded residual to the feature of the candidate image to obtain a synthesized feature. We also explored an ensemble strategy with variants of models and achieved a significant boost in performance comparing to the best single model. Finally, our approach achieved 2nd place in the Fashion-IQ 2020 Challenge with a test score of 48.02 on the leaderboard.

翻译：本文专门介绍VAA团队在2020年CVPR中向时装-IQ挑战提交的方法。根据一对图像和文本,我们展示了一种新型多式联运组成方法,即RTIC,它可以有效地将文字和图像模式结合到语义空间中,我们分别提取CNN和顺序模型(如LSTM或GRU)编码的图像和文字特征。为了强调目标与候选人之间特征剩余部分的含义,RETIC由带有频道关注模块的N区块组成。然后,我们在候选图像的特征中添加编码的剩余部分,以获得一个合成特征。我们还探索了带有模型变体的混合战略,并取得了与最佳单一模型(如LSTM或GRU)相比的显著提高绩效。最后,我们的方法在Fashason-IQ2020挑战中达到了第二位,领先板上测试分为48.02。

0

相关内容

TEAM

85岁MIT教授Gilbert Strang《线性代数》2020视频课，细致为你讲解线代，不怕学不会

85岁MIT教授Gilbert Strang《线性代数》2020视频课，细致为你讲解线代，不怕学不会

专知会员服务

126+阅读 · 2020年5月8日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

14+阅读 · 2020年5月5日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

91+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

54+阅读 · 2020年1月25日

【课程推荐】深度学习中的新兴挑战（Emerging Challenges in Deep Learning）

【课程推荐】深度学习中的新兴挑战（Emerging Challenges in Deep Learning）

专知会员服务

15+阅读 · 2019年11月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

89+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

98+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

14+阅读 · 2019年4月13日

由浅及深，细致解读图像问答 VQA 2018 Challenge 冠军模型 Pythia

由浅及深，细致解读图像问答 VQA 2018 Challenge 冠军模型 Pythia

GAN生成式对抗网络

50+阅读 · 2019年3月13日

【TED】生命中的每一年的智慧

【TED】生命中的每一年的智慧

英语演讲视频每日一推

9+阅读 · 2019年1月29日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

10+阅读 · 2017年11月12日

【推荐】视频目标分割基础

【推荐】视频目标分割基础

机器学习研究会

9+阅读 · 2017年9月19日

A Sketch-Based System for Semantic Parsing

A Sketch-Based System for Semantic Parsing

Arxiv

4+阅读 · 2019年9月12日

Fine-grained Sentiment Analysis with Faithful Attention

Fine-grained Sentiment Analysis with Faithful Attention

Arxiv

5+阅读 · 2019年8月19日

Deep Short Text Classification with Knowledge Powered Attention

Arxiv

8+阅读 · 2019年2月21日

Attend More Times for Image Captioning

Attend More Times for Image Captioning

Arxiv

6+阅读 · 2018年12月8日

"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

Arxiv

4+阅读 · 2018年7月29日

Multimodal Sentiment Analysis To Explore the Structure of Emotions

Arxiv

19+阅读 · 2018年5月25日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning

Arxiv

6+阅读 · 2018年4月15日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Arxiv

14+阅读 · 2018年3月14日

Object-based reasoning in VQA

Arxiv

6+阅读 · 2018年1月29日

VIP会员

文章信息

相关主题

相关VIP内容

85岁MIT教授Gilbert Strang《线性代数》2020视频课，细致为你讲解线代，不怕学不会

85岁MIT教授Gilbert Strang《线性代数》2020视频课，细致为你讲解线代，不怕学不会

专知会员服务

126+阅读 · 2020年5月8日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

14+阅读 · 2020年5月5日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

91+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

54+阅读 · 2020年1月25日

【课程推荐】深度学习中的新兴挑战（Emerging Challenges in Deep Learning）

【课程推荐】深度学习中的新兴挑战（Emerging Challenges in Deep Learning）

专知会员服务

15+阅读 · 2019年11月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

89+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

98+阅读 · 2019年10月9日

热门VIP内容

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

14+阅读 · 2019年4月13日

由浅及深，细致解读图像问答 VQA 2018 Challenge 冠军模型 Pythia

由浅及深，细致解读图像问答 VQA 2018 Challenge 冠军模型 Pythia

GAN生成式对抗网络

50+阅读 · 2019年3月13日

【TED】生命中的每一年的智慧

【TED】生命中的每一年的智慧

英语演讲视频每日一推

9+阅读 · 2019年1月29日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

10+阅读 · 2017年11月12日

【推荐】视频目标分割基础

【推荐】视频目标分割基础

机器学习研究会

9+阅读 · 2017年9月19日

相关论文

A Sketch-Based System for Semantic Parsing

A Sketch-Based System for Semantic Parsing

Arxiv

4+阅读 · 2019年9月12日

Fine-grained Sentiment Analysis with Faithful Attention

Fine-grained Sentiment Analysis with Faithful Attention

Arxiv

5+阅读 · 2019年8月19日

Deep Short Text Classification with Knowledge Powered Attention

Arxiv

8+阅读 · 2019年2月21日

Attend More Times for Image Captioning

Attend More Times for Image Captioning

Arxiv

6+阅读 · 2018年12月8日

"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

Arxiv

4+阅读 · 2018年7月29日

Multimodal Sentiment Analysis To Explore the Structure of Emotions

Arxiv

19+阅读 · 2018年5月25日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning

Arxiv

6+阅读 · 2018年4月15日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Arxiv

14+阅读 · 2018年3月14日

Object-based reasoning in VQA

Arxiv

6+阅读 · 2018年1月29日

微信扫码咨询专知VIP会员