稀疏形态器：有限隐变量的稀疏视觉识别 (SparseFormer: Sparse Visual Recognition via Limited Latent Tokens) - 专知论文

会员服务 ·

0

稀疏 · 视觉识别 · 范例 · 隐变量 · 识别 ·

2023 年 4 月 7 日

SparseFormer: Sparse Visual Recognition via Limited Latent Tokens

翻译：稀疏形态器：有限隐变量的稀疏视觉识别

Ziteng Gao,Zhan Tong,Limin Wang,Mike Zheng Shou

from arxiv, Technical report

Human visual recognition is a sparse process, where only a few salient visual cues are attended to rather than traversing every detail uniformly. However, most current vision networks follow a dense paradigm, processing every single visual unit (e.g,, pixel or patch) in a uniform manner. In this paper, we challenge this dense paradigm and present a new method, coined SparseFormer, to imitate human's sparse visual recognition in an end-to-end manner. SparseFormer learns to represent images using a highly limited number of tokens (down to 49) in the latent space with sparse feature sampling procedure instead of processing dense units in the original pixel space. Therefore, SparseFormer circumvents most of dense operations on the image space and has much lower computational costs. Experiments on the ImageNet classification benchmark dataset show that SparseFormer achieves performance on par with canonical or well-established models while offering better accuracy-throughput tradeoff. Moreover, the design of our network can be easily extended to the video classification with promising performance at lower computational costs. We hope that our work can provide an alternative way for visual modeling and inspire further research on sparse neural architectures. The code will be publicly available at https://github.com/showlab/sparseformer

翻译：---- 人类视觉识别是一种稀疏的过程，只关注一些显著的视觉线索，而不是均匀地遍历每一个细节。然而，大多数当前的视觉网络都遵循密集的范例，在均匀的方式处理每一个视觉单元（例如像素或补丁）。在本文中，我们挑战这个密集的范例，并提出了一种新方法，被称为稀疏形态器，以一种端到端的方式模仿人类的稀疏视觉识别。稀疏形态器学习使用一定数量的隐性变量来表示图像，在潜在空间中有稀疏特征采样过程，而不是在原始像素空间中处理稠密单元。因此，稀疏形态器避免了图像空间中的大部分稠密操作，具有更低的计算成本。在ImageNet分类基准数据集上的实验表明，稀疏形态器在提供更好的准确度和吞吐量折衷的同时，实现了与经典或已建立模型相当的性能。此外，我们的网络设计可以轻松扩展到视频分类，具有较低的计算成本和有希望的性能。我们希望我们的工作能够提供一种视觉建模的替代方案，并激发进一步研究稀疏神经结构。该代码将在https://github.com/showlab/sparseformer 上公开发布。

1

相关内容

【CVPR2023】基于图像特定提示学习的零样本生成模型自适应

【CVPR2023】基于图像特定提示学习的零样本生成模型自适应

专知会员服务

31+阅读 · 2023年4月7日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【CVPR 2022】可转移的稀疏对抗性攻击，Transferable Sparse Adversarial Attack

【CVPR 2022】可转移的稀疏对抗性攻击，Transferable Sparse Adversarial Attack

专知会员服务

15+阅读 · 2022年3月12日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

何恺明最新论文！用于计算机视觉的可扩展自监督学习方案Masked AutoEncoders

何恺明最新论文！用于计算机视觉的可扩展自监督学习方案Masked AutoEncoders

专知会员服务

30+阅读 · 2021年11月13日

【ICCV 2021】HCFlow：使用一个统一的框架处理图像超分辨率和图像再缩放

专知会员服务

15+阅读 · 2021年10月4日

替换Transformer！谷歌提出 Performer 模型，全面提升注意力机制！

替换Transformer！谷歌提出 Performer 模型，全面提升注意力机制！

专知会员服务

43+阅读 · 2020年10月29日

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

专知会员服务

44+阅读 · 2020年3月26日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

70+阅读 · 2020年1月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

论文 | YOLO（You Only Look Once）目标检测

论文 | YOLO（You Only Look Once）目标检测

七月在线实验室

14+阅读 · 2017年12月12日

Copine VII在阿尔茨海默病中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

受体MDSCs通过CEACAM1-TIM3调控NK细胞功能介导肝移植免疫耐受的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

精神分裂症易感因子ErbB4对篮状细胞和吊灯状细胞神经环路发育的调控和机制

国家自然科学基金

0+阅读 · 2014年12月31日

少突胶质细胞转录因子Olig2对精神分裂症海马神经发生的调控及机制

国家自然科学基金

0+阅读 · 2014年12月31日

高维稀疏统计模型中的变量选择与检验

国家自然科学基金

1+阅读 · 2014年12月31日

eCB介导的DSI效应在麻醉-觉醒调节中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

EphB受体信号介导的神经干细胞分化机制

国家自然科学基金

0+阅读 · 2012年12月31日

全局轨迹解析的通用框架和推理方法，以及在智能视频监控中的应用

国家自然科学基金

1+阅读 · 2011年12月31日

基于Sparse-Land模型的SAR图像噪声抑制与分割

国家自然科学基金

0+阅读 · 2009年12月31日

G蛋白偶联受体信号调控分子在突触可塑性中的作用及机制

国家自然科学基金

0+阅读 · 2008年12月31日

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

Arxiv

0+阅读 · 2023年5月25日

Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition

Arxiv

0+阅读 · 2023年5月25日

Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation

Arxiv

0+阅读 · 2023年5月24日

High Speed Human Action Recognition using a Photonic Reservoir Computer

Arxiv

0+阅读 · 2023年5月24日

MMNet: Multi-Mask Network for Referring Image Segmentation

Arxiv

0+阅读 · 2023年5月24日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Arxiv

28+阅读 · 2022年3月24日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

3D Hand Shape and Pose Estimation from a Single RGB Image

3D Hand Shape and Pose Estimation from a Single RGB Image

Arxiv

17+阅读 · 2019年3月3日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2023】基于图像特定提示学习的零样本生成模型自适应

【CVPR2023】基于图像特定提示学习的零样本生成模型自适应

专知会员服务

31+阅读 · 2023年4月7日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【CVPR 2022】可转移的稀疏对抗性攻击，Transferable Sparse Adversarial Attack

【CVPR 2022】可转移的稀疏对抗性攻击，Transferable Sparse Adversarial Attack

专知会员服务

15+阅读 · 2022年3月12日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

何恺明最新论文！用于计算机视觉的可扩展自监督学习方案Masked AutoEncoders

何恺明最新论文！用于计算机视觉的可扩展自监督学习方案Masked AutoEncoders

专知会员服务

30+阅读 · 2021年11月13日

【ICCV 2021】HCFlow：使用一个统一的框架处理图像超分辨率和图像再缩放

专知会员服务

15+阅读 · 2021年10月4日

替换Transformer！谷歌提出 Performer 模型，全面提升注意力机制！

替换Transformer！谷歌提出 Performer 模型，全面提升注意力机制！

专知会员服务

43+阅读 · 2020年10月29日

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

专知会员服务

44+阅读 · 2020年3月26日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

70+阅读 · 2020年1月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】移动计算摄影的神经场表示

大语言模型遇见法律人工智能：综述

【ICCV2025】InfGen：一种分辨率无关的可扩展图像合成范式

美军用无人地面战车发展：现代战争中超越弹药的多元应用

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

论文 | YOLO（You Only Look Once）目标检测

论文 | YOLO（You Only Look Once）目标检测

七月在线实验室

14+阅读 · 2017年12月12日

相关论文

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

Arxiv

0+阅读 · 2023年5月25日

Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition

Arxiv

0+阅读 · 2023年5月25日

Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation

Arxiv

0+阅读 · 2023年5月24日

High Speed Human Action Recognition using a Photonic Reservoir Computer

Arxiv

0+阅读 · 2023年5月24日

MMNet: Multi-Mask Network for Referring Image Segmentation

Arxiv

0+阅读 · 2023年5月24日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Arxiv

28+阅读 · 2022年3月24日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

3D Hand Shape and Pose Estimation from a Single RGB Image

3D Hand Shape and Pose Estimation from a Single RGB Image

Arxiv

17+阅读 · 2019年3月3日

相关基金

Copine VII在阿尔茨海默病中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

受体MDSCs通过CEACAM1-TIM3调控NK细胞功能介导肝移植免疫耐受的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

精神分裂症易感因子ErbB4对篮状细胞和吊灯状细胞神经环路发育的调控和机制

国家自然科学基金

0+阅读 · 2014年12月31日

少突胶质细胞转录因子Olig2对精神分裂症海马神经发生的调控及机制

国家自然科学基金

0+阅读 · 2014年12月31日

高维稀疏统计模型中的变量选择与检验

国家自然科学基金

1+阅读 · 2014年12月31日

eCB介导的DSI效应在麻醉-觉醒调节中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

EphB受体信号介导的神经干细胞分化机制

国家自然科学基金

0+阅读 · 2012年12月31日

全局轨迹解析的通用框架和推理方法，以及在智能视频监控中的应用

国家自然科学基金

1+阅读 · 2011年12月31日

基于Sparse-Land模型的SAR图像噪声抑制与分割

国家自然科学基金

0+阅读 · 2009年12月31日

G蛋白偶联受体信号调控分子在突触可塑性中的作用及机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员