Many deep learning models, developed in recent years, reach higher ImageNet accuracy than ResNet50, with fewer or comparable FLOPS count. While FLOPs are often seen as a proxy for network efficiency, when measuring actual GPU training and inference throughput, vanilla ResNet50 is usually significantly faster than its recent competitors, offering better throughput-accuracy trade-off. In this work, we introduce a series of architecture modifications that aim to boost neural networks' accuracy, while retaining their GPU training and inference efficiency. We first demonstrate and discuss the bottlenecks induced by FLOPs-optimizations. We then suggest alternative designs that better utilize GPU structure and assets. Finally, we introduce a new family of GPU-dedicated models, called TResNet, which achieve better accuracy and efficiency than previous ConvNets. Using a TResNet model, with similar GPU throughput to ResNet50, we reach 80.7% top-1 accuracy on ImageNet. Our TResNet models also transfer well and achieve state-of-the-art accuracy on competitive datasets such as Stanford cars (96.0%), CIFAR-10 (99.0%), CIFAR-100 (91.5%) and Oxford-Flowers (99.1%). Implementation is available at: https://github.com/mrT23/TResNet

6
下载
关闭预览

相关内容

在这项工作中,我们介绍了一系列的架构修改,旨在提高神经网络的准确性,同时保持他们的GPU训练和推理效率。我们首先演示和讨论由flops优化引起的瓶颈。然后,我们建议更好地利用GPU结构和资产的替代设计。最后,我们介绍了一种新的GPU专用模型,称为TResNet,它比以前的ConvNets具有更好的准确性和效率。使用TResNet模型,与ResNet50的GPU吞吐量相似,在ImageNet上达到80.7%的top-1精度。我们的TResNet模型也能很好地传输竞争数据集,并达到最先进的精度,如Stanford cars(96.0%)、CIFAR-10(99.0%)、CIFAR-100(91.5%)和牛津花卉(99.1%)。实现可在:这个

https://github.com/mrT23/TResNet

成为VIP会员查看完整内容
0
21

Model efficiency has become increasingly important in computer vision. In this paper, we systematically study various neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. First, we propose a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multi-scale feature fusion; Second, we propose a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time. Based on these optimizations, we have developed a new family of object detectors, called EfficientDet, which consistently achieve an order-of-magnitude better efficiency than prior art across a wide spectrum of resource constraints. In particular, without bells and whistles, our EfficientDet-D7 achieves stateof-the-art 51.0 mAP on COCO dataset with 52M parameters and 326B FLOPS1 , being 4x smaller and using 9.3x fewer FLOPS yet still more accurate (+0.3% mAP) than the best previous detector.

0
5
下载
预览

Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available. In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, we propose a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. We demonstrate the effectiveness of this method on scaling up MobileNets and ResNet. To go even further, we use neural architecture search to design a new baseline network and scale it up to obtain a family of models, called EfficientNets, which achieve much better accuracy and efficiency than previous ConvNets. In particular, our EfficientNet-B7 achieves state-of-the-art 84.4% top-1 / 97.1% top-5 accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference than the best existing ConvNet. Our EfficientNets also transfer well and achieve state-of-the-art accuracy on CIFAR-100 (91.7%), Flowers (98.8%), and 3 other transfer learning datasets, with an order of magnitude fewer parameters. Source code is at https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet.

0
3
下载
预览

Current state-of-the-art convolutional architectures for object detection are manually designed. Here we aim to learn a better architecture of feature pyramid network for object detection. We adopt Neural Architecture Search and discover a new feature pyramid architecture in a novel scalable search space covering all cross-scale connections. The discovered architecture, named NAS-FPN, consists of a combination of top-down and bottom-up connections to fuse features across scales. NAS-FPN, combined with various backbone models in the RetinaNet framework, achieves better accuracy and latency tradeoff compared to state-of-the-art object detection models. NAS-FPN improves mobile detection accuracy by 2 AP compared to state-of-the-art SSDLite with MobileNetV2 model in [32] and achieves 48.3 AP which surpasses Mask R-CNN [10] detection accuracy with less computation time.

0
7
下载
预览

In this paper, we propose Generative Adversarial Network (GAN) architectures that use Capsule Networks for image-synthesis. Based on the principal of positional-equivariance of features, Capsule Network's ability to encode spatial relationships between the features of the image helps it become a more powerful critic in comparison to Convolutional Neural Networks (CNNs) used in current architectures for image synthesis. Our proposed GAN architectures learn the data manifold much faster and therefore, synthesize visually accurate images in significantly lesser number of training samples and training epochs in comparison to GANs and its variants that use CNNs. Apart from analyzing the quantitative results corresponding the images generated by different architectures, we also explore the reasons for the lower coverage and diversity explored by the GAN architectures that use CNN critics.

0
3
下载
预览

Automatic neural architecture design has shown its potential in discovering powerful neural network architectures. Existing methods, no matter based on reinforcement learning or evolutionary algorithms (EA), conduct architecture search in a discrete space, which is highly inefficient. In this paper, we propose a simple and efficient method to automatic neural architecture design based on continuous optimization. We call this new approach neural architecture optimization (NAO). There are three key components in our proposed approach: (1) An encoder embeds/maps neural network architectures into a continuous space. (2) A predictor takes the continuous representation of a network as input and predicts its accuracy. (3) A decoder maps a continuous representation of a network back to its architecture. The performance predictor and the encoder enable us to perform gradient based optimization in the continuous space to find the embedding of a new architecture with potentially better accuracy. Such a better embedding is then decoded to a network by the decoder. Experiments show that the architecture discovered by our method is very competitive for image classification task on CIFAR-10 and language modeling task on PTB, outperforming or on par with the best results of previous architecture search methods with a significantly reduction of computational resources. Specifically we obtain $2.07\%$ test set error rate for CIFAR-10 image classification task and $55.9$ test set perplexity of PTB language modeling task. The best discovered architectures on both tasks are successfully transferred to other tasks such as CIFAR-100 and WikiText-2.

0
7
下载
预览

Currently, the neural network architecture design is mostly guided by the \emph{indirect} metric of computation complexity, i.e., FLOPs. However, the \emph{direct} metric, e.g., speed, also depends on the other factors such as memory access cost and platform characterics. Thus, this work proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs. Based on a series of controlled experiments, this work derives several practical \emph{guidelines} for efficient network design. Accordingly, a new architecture is presented, called \emph{ShuffleNet V2}. Comprehensive ablation experiments verify that our model is the state-of-the-art in terms of speed and accuracy tradeoff.

0
3
下载
预览

Generative Adversarial Networks (GANs) are a class of deep generative models which aim to learn a target distribution in an unsupervised fashion. While they were successfully applied to many problems, training a GAN is a notoriously challenging task and requires a significant amount of hyperparameter tuning, neural architecture engineering, and a non-trivial amount of "tricks". The success in many practical applications coupled with the lack of a measure to quantify the failure modes of GANs resulted in a plethora of proposed losses, regularization and normalization schemes, and neural architectures. In this work we take a sober view of the current state of GANs from a practical perspective. We reproduce the current state of the art and go beyond fairly exploring the GAN landscape. We discuss common pitfalls and reproducibility issues, open-source our code on Github, and provide pre-trained models on TensorFlow Hub.

0
3
下载
预览

This paper introduces an online model for object detection in videos designed to run in real-time on low-powered mobile and embedded devices. Our approach combines fast single-image object detection with convolutional long short term memory (LSTM) layers to create an interweaved recurrent-convolutional architecture. Additionally, we propose an efficient Bottleneck-LSTM layer that significantly reduces computational cost compared to regular LSTMs. Our network achieves temporal awareness by using Bottleneck-LSTMs to refine and propagate feature maps across frames. This approach is substantially faster than existing detection methods in video, outperforming the fastest single-frame models in model size and computational cost while attaining accuracy comparable to much more expensive single-frame models on the Imagenet VID 2015 dataset. Our model reaches a real-time inference speed of up to 15 FPS on a mobile CPU.

0
11
下载
预览

The work in this paper is driven by the question how to exploit the temporal cues available in videos for their accurate classification, and for human action recognition in particular? Thus far, the vision community has focused on spatio-temporal approaches with fixed temporal convolution kernel depths. We introduce a new temporal layer that models variable temporal convolution kernel depths. We embed this new temporal layer in our proposed 3D CNN. We extend the DenseNet architecture - which normally is 2D - with 3D filters and pooling kernels. We name our proposed video convolutional network `Temporal 3D ConvNet'~(T3D) and its new temporal layer `Temporal Transition Layer'~(TTL). Our experiments show that T3D outperforms the current state-of-the-art methods on the HMDB51, UCF101 and Kinetics datasets. The other issue in training 3D ConvNets is about training them from scratch with a huge labeled dataset to get a reasonable performance. So the knowledge learned in 2D ConvNets is completely ignored. Another contribution in this work is a simple and effective technique to transfer knowledge from a pre-trained 2D CNN to a randomly initialized 3D CNN for a stable weight initialization. This allows us to significantly reduce the number of training samples for 3D CNNs. Thus, by finetuning this network, we beat the performance of generic and recent methods in 3D CNNs, which were trained on large video datasets, e.g. Sports-1M, and finetuned on the target datasets, e.g. HMDB51/UCF101. The T3D codes will be released

0
8
下载
预览
小贴士
相关论文
EfficientDet: Scalable and Efficient Object Detection
Mingxing Tan,Ruoming Pang,Quoc V. Le
5+阅读 · 2019年11月20日
Mingxing Tan,Quoc V. Le
3+阅读 · 2019年5月28日
Golnaz Ghiasi,Tsung-Yi Lin,Ruoming Pang,Quoc V. Le
7+阅读 · 2019年4月16日
Generative Adversarial Network Architectures For Image Synthesis Using Capsule Networks
Yash Upadhyay,Paul Schrater
3+阅读 · 2018年11月20日
Hengfui Liau,Nimmagadda Yamini,YengLiong Wong
3+阅读 · 2018年10月16日
Neural Architecture Optimization
Renqian Luo,Fei Tian,Tao Qin,Enhong Chen,Tie-Yan Liu
7+阅读 · 2018年9月5日
ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
Ningning Ma,Xiangyu Zhang,Hai-Tao Zheng,Jian Sun
3+阅读 · 2018年7月30日
Karol Kurach,Mario Lucic,Xiaohua Zhai,Marcin Michalski,Sylvain Gelly
3+阅读 · 2018年7月12日
Mason Liu,Menglong Zhu
11+阅读 · 2018年3月28日
Ali Diba,Mohsen Fayyaz,Vivek Sharma,Amir Hossein Karami,Mohammad Mahdi Arzani,Rahman Yousefzadeh,Luc Van Gool
8+阅读 · 2017年11月22日
相关VIP内容
专知会员服务
44+阅读 · 2020年3月19日
Stabilizing Transformers for Reinforcement Learning
专知会员服务
28+阅读 · 2019年10月17日
[综述]深度学习下的场景文本检测与识别
专知会员服务
45+阅读 · 2019年10月10日
相关资讯
LibRec 精选:AutoML for Contextual Bandits
LibRec智能推荐
6+阅读 · 2019年9月19日
Hierarchically Structured Meta-learning
CreateAMind
13+阅读 · 2019年5月22日
meta learning 17年:MAML SNAIL
CreateAMind
9+阅读 · 2019年1月2日
Ray RLlib: Scalable 降龙十八掌
CreateAMind
5+阅读 · 2018年12月28日
Hierarchical Imitation - Reinforcement Learning
CreateAMind
16+阅读 · 2018年5月25日
Capsule Networks解析
机器学习研究会
10+阅读 · 2017年11月12日
【推荐】YOLO实时目标检测(6fps)
机器学习研究会
17+阅读 · 2017年11月5日
gan生成图像at 1024² 的 代码 论文
CreateAMind
4+阅读 · 2017年10月31日
Adversarial Variational Bayes: Unifying VAE and GAN 代码
CreateAMind
7+阅读 · 2017年10月4日
Auto-Encoding GAN
CreateAMind
5+阅读 · 2017年8月4日
Top